1dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond/* 2dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Licensed to the Apache Software Foundation (ASF) under one or more 3dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * contributor license agreements. See the NOTICE file distributed with 4dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * this work for additional information regarding copyright ownership. 5dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * The ASF licenses this file to You under the Apache License, Version 2.0 6dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * (the "License"); you may not use this file except in compliance with 7dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * the License. You may obtain a copy of the License at 8dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 9dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * http://www.apache.org/licenses/LICENSE-2.0 10dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 11dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Unless required by applicable law or agreed to in writing, software 12dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * distributed under the License is distributed on an "AS IS" BASIS, 13dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * See the License for the specific language governing permissions and 15dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * limitations under the License. 16dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 17dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 18dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondpackage org.apache.commons.math.random; 19dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 20dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.io.BufferedReader; 21dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.io.File; 22dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.io.FileReader; 23dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.io.IOException; 24dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.io.InputStreamReader; 25dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.io.Serializable; 26dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.net.URL; 27dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.util.ArrayList; 28dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport java.util.List; 29dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 30dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport org.apache.commons.math.MathRuntimeException; 31dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport org.apache.commons.math.exception.util.LocalizedFormats; 32dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport org.apache.commons.math.stat.descriptive.StatisticalSummary; 33dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport org.apache.commons.math.stat.descriptive.SummaryStatistics; 34dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondimport org.apache.commons.math.util.FastMath; 35dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 36dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond/** 37dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Implements <code>EmpiricalDistribution</code> interface. This implementation 38dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * uses what amounts to the 39dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <a href="http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_6.html"> 40dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Variable Kernel Method</a> with Gaussian smoothing:<p> 41dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <strong>Digesting the input file</strong> 42dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <ol><li>Pass the file once to compute min and max.</li> 43dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <li>Divide the range from min-max into <code>binCount</code> "bins."</li> 44dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <li>Pass the data file again, computing bin counts and univariate 45dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * statistics (mean, std dev.) for each of the bins </li> 46dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <li>Divide the interval (0,1) into subintervals associated with the bins, 47dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * with the length of a bin's subinterval proportional to its count.</li></ol> 48dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <strong>Generating random values from the distribution</strong><ol> 49dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <li>Generate a uniformly distributed value in (0,1) </li> 50dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <li>Select the subinterval to which the value belongs. 51dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <li>Generate a random Gaussian value with mean = mean of the associated 52dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * bin and std dev = std dev of associated bin.</li></ol></p><p> 53dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond *<strong>USAGE NOTES:</strong><ul> 54dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond *<li>The <code>binCount</code> is set by default to 1000. A good rule of thumb 55dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * is to set the bin count to approximately the length of the input file divided 56dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * by 10. </li> 57dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond *<li>The input file <i>must</i> be a plain text file containing one valid numeric 58dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * entry per line.</li> 59dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * </ul></p> 60dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 61dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @version $Revision: 1003886 $ $Date: 2010-10-02 23:04:44 +0200 (sam. 02 oct. 2010) $ 62dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 63dee0849a9704d532af0b550146cbafbaa6ee1d19Raymondpublic class EmpiricalDistributionImpl implements Serializable, EmpiricalDistribution { 64dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 65dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** Serializable version identifier */ 66dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private static final long serialVersionUID = 5729073523949762654L; 67dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 68dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** List of SummaryStatistics objects characterizing the bins */ 69dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private final List<SummaryStatistics> binStats; 70dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 71dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** Sample statistics */ 72dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private SummaryStatistics sampleStats = null; 73dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 74dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** Max loaded value */ 75dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private double max = Double.NEGATIVE_INFINITY; 76dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 77dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** Min loaded value */ 78dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private double min = Double.POSITIVE_INFINITY; 79dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 80dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** Grid size */ 81dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private double delta = 0d; 82dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 83dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** number of bins */ 84dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private final int binCount; 85dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 86dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** is the distribution loaded? */ 87dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private boolean loaded = false; 88dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 89dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** upper bounds of subintervals in (0,1) "belonging" to the bins */ 90dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private double[] upperBounds = null; 91dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 92dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** RandomData instance to use in repeated calls to getNext() */ 93dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private final RandomData randomData = new RandomDataImpl(); 94dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 95dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 96dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Creates a new EmpiricalDistribution with the default bin count. 97dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 98dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public EmpiricalDistributionImpl() { 99dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binCount = 1000; 100dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binStats = new ArrayList<SummaryStatistics>(); 101dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 102dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 103dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 104dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Creates a new EmpiricalDistribution with the specified bin count. 105dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 106dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param binCount number of bins 107dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 108dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public EmpiricalDistributionImpl(int binCount) { 109dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond this.binCount = binCount; 110dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binStats = new ArrayList<SummaryStatistics>(); 111dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 112dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 113dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 114dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Computes the empirical distribution from the provided 115dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * array of numbers. 116dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 117dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param in the input data array 118dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 119dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public void load(double[] in) { 120dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond DataAdapter da = new ArrayDataAdapter(in); 121dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond try { 122dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond da.computeStats(); 123dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond fillBinStats(in); 124dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } catch (IOException e) { 125dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond throw new MathRuntimeException(e); 126dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 127dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond loaded = true; 128dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 129dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 130dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 131dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 132dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Computes the empirical distribution using data read from a URL. 133dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param url url of the input file 134dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 135dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @throws IOException if an IO error occurs 136dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 137dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public void load(URL url) throws IOException { 138dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond BufferedReader in = 139dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond new BufferedReader(new InputStreamReader(url.openStream())); 140dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond try { 141dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond DataAdapter da = new StreamDataAdapter(in); 142dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond da.computeStats(); 143dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond if (sampleStats.getN() == 0) { 144dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond throw MathRuntimeException.createEOFException(LocalizedFormats.URL_CONTAINS_NO_DATA, 145dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond url); 146dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 147dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond in = new BufferedReader(new InputStreamReader(url.openStream())); 148dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond fillBinStats(in); 149dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond loaded = true; 150dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } finally { 151dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond try { 152dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond in.close(); 153dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } catch (IOException ex) { 154dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond // ignore 155dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 156dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 157dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 158dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 159dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 160dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Computes the empirical distribution from the input file. 161dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 162dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param file the input file 163dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @throws IOException if an IO error occurs 164dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 165dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public void load(File file) throws IOException { 166dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond BufferedReader in = new BufferedReader(new FileReader(file)); 167dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond try { 168dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond DataAdapter da = new StreamDataAdapter(in); 169dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond da.computeStats(); 170dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond in = new BufferedReader(new FileReader(file)); 171dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond fillBinStats(in); 172dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond loaded = true; 173dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } finally { 174dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond try { 175dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond in.close(); 176dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } catch (IOException ex) { 177dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond // ignore 178dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 179dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 180dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 181dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 182dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 183dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Provides methods for computing <code>sampleStats</code> and 184dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <code>beanStats</code> abstracting the source of data. 185dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 186dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private abstract class DataAdapter{ 187dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 188dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 189dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Compute bin stats. 190dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 191dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @throws IOException if an error occurs computing bin stats 192dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 193dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public abstract void computeBinStats() throws IOException; 194dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 195dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 196dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Compute sample statistics. 197dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 198dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @throws IOException if an error occurs computing sample stats 199dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 200dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public abstract void computeStats() throws IOException; 201dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 202dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 203dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 204dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 205dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Factory of <code>DataAdapter</code> objects. For every supported source 206dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * of data (array of doubles, file, etc.) an instance of the proper object 207dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * is returned. 208dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 209dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private class DataAdapterFactory{ 210dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 211dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Creates a DataAdapter from a data object 212dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 213dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param in object providing access to the data 214dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return DataAdapter instance 215dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 216dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public DataAdapter getAdapter(Object in) { 217dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond if (in instanceof BufferedReader) { 218dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond BufferedReader inputStream = (BufferedReader) in; 219dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return new StreamDataAdapter(inputStream); 220dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } else if (in instanceof double[]) { 221dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond double[] inputArray = (double[]) in; 222dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return new ArrayDataAdapter(inputArray); 223dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } else { 224dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond throw MathRuntimeException.createIllegalArgumentException( 225dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond LocalizedFormats.INPUT_DATA_FROM_UNSUPPORTED_DATASOURCE, 226dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond in.getClass().getName(), 227dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond BufferedReader.class.getName(), double[].class.getName()); 228dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 229dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 230dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 231dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 232dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <code>DataAdapter</code> for data provided through some input stream 233dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 234dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private class StreamDataAdapter extends DataAdapter{ 235dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 236dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** Input stream providing access to the data */ 237dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private BufferedReader inputStream; 238dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 239dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 240dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Create a StreamDataAdapter from a BufferedReader 241dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 242dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param in BufferedReader input stream 243dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 244dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public StreamDataAdapter(BufferedReader in){ 245dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond super(); 246dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond inputStream = in; 247dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 248dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 249dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** {@inheritDoc} */ 250dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond @Override 251dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public void computeBinStats() throws IOException { 252dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond String str = null; 253dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond double val = 0.0d; 254dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond while ((str = inputStream.readLine()) != null) { 255dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond val = Double.parseDouble(str); 256dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond SummaryStatistics stats = binStats.get(findBin(val)); 257dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond stats.addValue(val); 258dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 259dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 260dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond inputStream.close(); 261dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond inputStream = null; 262dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 263dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 264dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** {@inheritDoc} */ 265dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond @Override 266dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public void computeStats() throws IOException { 267dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond String str = null; 268dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond double val = 0.0; 269dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond sampleStats = new SummaryStatistics(); 270dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond while ((str = inputStream.readLine()) != null) { 271dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond val = Double.valueOf(str).doubleValue(); 272dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond sampleStats.addValue(val); 273dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 274dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond inputStream.close(); 275dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond inputStream = null; 276dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 277dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 278dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 279dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 280dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <code>DataAdapter</code> for data provided as array of doubles. 281dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 282dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private class ArrayDataAdapter extends DataAdapter { 283dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 284dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** Array of input data values */ 285dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private double[] inputArray; 286dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 287dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 288dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Construct an ArrayDataAdapter from a double[] array 289dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 290dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param in double[] array holding the data 291dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 292dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public ArrayDataAdapter(double[] in){ 293dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond super(); 294dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond inputArray = in; 295dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 296dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 297dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** {@inheritDoc} */ 298dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond @Override 299dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public void computeStats() throws IOException { 300dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond sampleStats = new SummaryStatistics(); 301dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond for (int i = 0; i < inputArray.length; i++) { 302dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond sampleStats.addValue(inputArray[i]); 303dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 304dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 305dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 306dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** {@inheritDoc} */ 307dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond @Override 308dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public void computeBinStats() throws IOException { 309dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond for (int i = 0; i < inputArray.length; i++) { 310dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond SummaryStatistics stats = 311dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binStats.get(findBin(inputArray[i])); 312dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond stats.addValue(inputArray[i]); 313dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 314dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 315dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 316dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 317dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 318dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Fills binStats array (second pass through data file). 319dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 320dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param in object providing access to the data 321dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @throws IOException if an IO error occurs 322dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 323dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private void fillBinStats(Object in) throws IOException { 324dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond // Set up grid 325dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond min = sampleStats.getMin(); 326dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond max = sampleStats.getMax(); 327dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond delta = (max - min)/(Double.valueOf(binCount)).doubleValue(); 328dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 329dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond // Initialize binStats ArrayList 330dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond if (!binStats.isEmpty()) { 331dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binStats.clear(); 332dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 333dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond for (int i = 0; i < binCount; i++) { 334dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond SummaryStatistics stats = new SummaryStatistics(); 335dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binStats.add(i,stats); 336dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 337dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 338dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond // Filling data in binStats Array 339dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond DataAdapterFactory aFactory = new DataAdapterFactory(); 340dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond DataAdapter da = aFactory.getAdapter(in); 341dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond da.computeBinStats(); 342dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 343dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond // Assign upperBounds based on bin counts 344dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond upperBounds = new double[binCount]; 345dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond upperBounds[0] = 346dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond ((double) binStats.get(0).getN()) / (double) sampleStats.getN(); 347dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond for (int i = 1; i < binCount-1; i++) { 348dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond upperBounds[i] = upperBounds[i-1] + 349dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond ((double) binStats.get(i).getN()) / (double) sampleStats.getN(); 350dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 351dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond upperBounds[binCount-1] = 1.0d; 352dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 353dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 354dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 355dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Returns the index of the bin to which the given value belongs 356dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 357dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @param value the value whose bin we are trying to find 358dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return the index of the bin containing the value 359dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 360dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond private int findBin(double value) { 361dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return FastMath.min( 362dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond FastMath.max((int) FastMath.ceil((value- min) / delta) - 1, 0), 363dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binCount - 1); 364dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 365dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 366dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 367dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Generates a random value from this distribution. 368dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 369dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return the random value. 370dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @throws IllegalStateException if the distribution has not been loaded 371dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 372dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public double getNextValue() throws IllegalStateException { 373dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 374dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond if (!loaded) { 375dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond throw MathRuntimeException.createIllegalStateException(LocalizedFormats.DISTRIBUTION_NOT_LOADED); 376dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 377dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 378dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond // Start with a uniformly distributed random number in (0,1) 379dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond double x = FastMath.random(); 380dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 381dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond // Use this to select the bin and generate a Gaussian within the bin 382dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond for (int i = 0; i < binCount; i++) { 383dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond if (x <= upperBounds[i]) { 384dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond SummaryStatistics stats = binStats.get(i); 385dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond if (stats.getN() > 0) { 386dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond if (stats.getStandardDeviation() > 0) { // more than one obs 387dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return randomData.nextGaussian 388dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond (stats.getMean(),stats.getStandardDeviation()); 389dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } else { 390dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return stats.getMean(); // only one obs in bin 391dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 392dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 393dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 394dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 395dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond throw new MathRuntimeException(LocalizedFormats.NO_BIN_SELECTED); 396dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 397dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 398dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 399dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Returns a {@link StatisticalSummary} describing this distribution. 400dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <strong>Preconditions:</strong><ul> 401dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <li>the distribution must be loaded before invoking this method</li></ul> 402dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 403dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return the sample statistics 404dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @throws IllegalStateException if the distribution has not been loaded 405dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 406dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public StatisticalSummary getSampleStats() { 407dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return sampleStats; 408dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 409dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 410dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 411dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Returns the number of bins. 412dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 413dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return the number of bins. 414dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 415dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public int getBinCount() { 416dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return binCount; 417dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 418dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 419dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 420dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Returns a List of {@link SummaryStatistics} instances containing 421dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * statistics describing the values in each of the bins. The list is 422dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * indexed on the bin number. 423dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 424dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return List of bin statistics. 425dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 426dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public List<SummaryStatistics> getBinStats() { 427dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return binStats; 428dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 429dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 430dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 431dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <p>Returns a fresh copy of the array of upper bounds for the bins. 432dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Bins are: <br/> 433dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * [min,upperBounds[0]],(upperBounds[0],upperBounds[1]],..., 434dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * (upperBounds[binCount-2], upperBounds[binCount-1] = max].</p> 435dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 436dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <p>Note: In versions 1.0-2.0 of commons-math, this method 437dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * incorrectly returned the array of probability generator upper 438dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * bounds now returned by {@link #getGeneratorUpperBounds()}.</p> 439dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 440dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return array of bin upper bounds 441dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @since 2.1 442dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 443dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public double[] getUpperBounds() { 444dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond double[] binUpperBounds = new double[binCount]; 445dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binUpperBounds[0] = min + delta; 446dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond for (int i = 1; i < binCount - 1; i++) { 447dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binUpperBounds[i] = binUpperBounds[i-1] + delta; 448dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 449dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond binUpperBounds[binCount - 1] = max; 450dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return binUpperBounds; 451dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 452dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 453dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 454dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <p>Returns a fresh copy of the array of upper bounds of the subintervals 455dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * of [0,1] used in generating data from the empirical distribution. 456dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Subintervals correspond to bins with lengths proportional to bin counts.</p> 457dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 458dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * <p>In versions 1.0-2.0 of commons-math, this array was (incorrectly) returned 459dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * by {@link #getUpperBounds()}.</p> 460dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 461dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @since 2.1 462dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return array of upper bounds of subintervals used in data generation 463dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 464dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public double[] getGeneratorUpperBounds() { 465dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond int len = upperBounds.length; 466dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond double[] out = new double[len]; 467dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond System.arraycopy(upperBounds, 0, out, 0, len); 468dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return out; 469dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 470dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond 471dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond /** 472dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * Property indicating whether or not the distribution has been loaded. 473dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * 474dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond * @return true if the distribution has been loaded 475dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond */ 476dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond public boolean isLoaded() { 477dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond return loaded; 478dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond } 479dee0849a9704d532af0b550146cbafbaa6ee1d19Raymond} 480