org.jaitools.numeric
Class StreamingSampleStats

java.lang.Object
  extended by org.jaitools.numeric.StreamingSampleStats

public class StreamingSampleStats
extends Object

A class to calculate summary statistics for a sample of Double-valued buffers that is received as a (potentially long) stream of values rather than in a single batch. Any Double.NaN values in the stream will be ignored.

Two options are offered to calculate sample median. Where it is known a priori that the data stream can be accomodated in memory, the exact median can be requested with Statistic.MEDIAN. Where the length of the data stream is unknown, or known to be too large to be held in memory, an approximate median can be calculated using the 'remedian' estimator as described in:

PJ Rousseeuw and GW Bassett (1990) The remedian: a robust averaging method for large data sets. Journal of the American Statistical Society 85:97-104
This is requested with Statistic.APPROX_MEDIAN.

Note: the 'remedian' estimator performs badly with non-stationary data, e.g. a data stream that is monotonically increasing will result in an estimate for the median that is too high. If possible, it is best to de-trend or randomly order the data prior to streaming it.

Example of use:


 StreamingSampleStats strmStats = new StreamingSampleStats();

 // set the statistics that will be calculated
 Statistic[] stats = {
     Statistic.MEAN,
     Statistic.SDEV,
     Statistic.RANGE,
     Statistic.APPROX_MEDIAN
 };
 strmStats.setStatistics(stats);

 // some process that generates a long stream of data
 while (somethingBigIsRunning) {
     double value = ...
     strmStats.offer(value);
 }

 // report the results
 for (Statistic s : stats) {
     System.out.println(String.format("%s: %.4f", s, strmStats.getStatisticValue(s)));
 }

 

Since:
1.0
Version:
$Id$
Author:
Michael Bedward, Daniele Romagnoli, GeoSolutions S.A.S.

Constructor Summary
StreamingSampleStats()
          Creates a new sampler and sets the default range type to Range.Type.EXCLUDE.
StreamingSampleStats(Range.Type rangesType)
          Creates a new sampler with specified use of Ranges.
 
Method Summary
 void addNoDataRange(Range<Double> noData)
          Adds a range of values to be considered as NoData and then to be excluded from the calculation of all statistics.
 void addNoDataValue(Double noData)
          Adds a single value to be considered as NoData.
 void addRange(Range<Double> range)
          Adds a range of values to include in or exclude from the calculation of all statistics.
 void addRange(Range<Double> range, Range.Type rangesType)
          Adds a range of values to include in or exclude from the calculation of all statistics.
 long getNumAccepted(Statistic stat)
          Gets the number of sample values that have been accepted for the specified Statistic.
 long getNumNaN(Statistic stat)
          Gets the number of NaN values that have been offered.
 long getNumNoData(Statistic stat)
          Gets the number of NoData values (including NaN) that have been offered.
 long getNumOffered(Statistic stat)
          Gets the number of sample values that have been offered for the specified Statistic.
 Set<Statistic> getStatistics()
          Gets the statistics that are currently set.
 Double getStatisticValue(Statistic stat)
          Gets the current value of a running statistic.
 Map<Statistic,Double> getStatisticValues()
          Gets the values of all statistics calculated by this sampler.
 boolean isSet(Statistic stat)
          Tests whether the specified statistic is currently set.
 void offer(Double sample)
          Offers a sample value.
 void offer(Double[] samples)
          Offers an array of sample values.
 void setStatistic(Statistic stat)
          Adds a statistic to those calculated by this sampler.
 void setStatistics(Statistic[] stats)
          Adds the given statistics to those that will be calculated by this sampler.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StreamingSampleStats

public StreamingSampleStats()
Creates a new sampler and sets the default range type to Range.Type.EXCLUDE.


StreamingSampleStats

public StreamingSampleStats(Range.Type rangesType)
Creates a new sampler with specified use of Ranges.

Parameters:
rangesType - either Range.Type.INCLUDE or Range.Type.EXCLUDE
Method Detail

setStatistic

public void setStatistic(Statistic stat)
Adds a statistic to those calculated by this sampler. If the same statistic was previously set then calling this method has no effect.

Parameters:
stat - the statistic
See Also:
Statistic

setStatistics

public void setStatistics(Statistic[] stats)
Adds the given statistics to those that will be calculated by this sampler.

Parameters:
stats - the statistics
See Also:
setStatistic(Statistic)

isSet

public boolean isSet(Statistic stat)
Tests whether the specified statistic is currently set. Note that statistics can be set indirectly because of logical groupings. For example, if Statistic.MEAN is set then SDEV and VARIANCE will also be set as these three are calculated together. The same is true for MIN, MAX and RANGE.

Parameters:
stat - the statistic
Returns:
true if the statistic has been set; false otherwise.

addNoDataRange

public void addNoDataRange(Range<Double> noData)
Adds a range of values to be considered as NoData and then to be excluded from the calculation of all statistics. NoData ranges take precedence over included / excluded data ranges.

Parameters:
noData - the range defining NoData values

addNoDataValue

public void addNoDataValue(Double noData)
Adds a single value to be considered as NoData.

Parameters:
noData - the value to be treated as NoData
See Also:
addNoDataRange(Range)

addRange

public void addRange(Range<Double> range)
Adds a range of values to include in or exclude from the calculation of all statistics. If further statistics are set after calling this method the range will be applied to them as well.

Parameters:
range - the range to include/exclude

addRange

public void addRange(Range<Double> range,
                     Range.Type rangesType)
Adds a range of values to include in or exclude from the calculation of all statistics. If further statistics are set after calling this method the range will be applied to them as well.

Parameters:
range - the range to include/exclude
rangesType - one of Range.Type.INCLUDE or Range.Type.EXCLUDE

getStatistics

public Set<Statistic> getStatistics()
Gets the statistics that are currently set.

Returns:
the statistics

getStatisticValue

public Double getStatisticValue(Statistic stat)
Gets the current value of a running statistic. If there have not been enough samples provided to compute the statistic, Double.NaN is returned.

Parameters:
stat - the statistic
Returns:
the current value of the statistic
Throws:
IllegalStateException - if stat was not previously set

getNumAccepted

public long getNumAccepted(Statistic stat)
Gets the number of sample values that have been accepted for the specified Statistic.

Note that different statistics might have been set at different times in the sampling process.

Parameters:
stat - the statistic
Returns:
number of samples that have been accepted
Throws:
IllegalArgumentException - if the statistic hasn't been set

getNumOffered

public long getNumOffered(Statistic stat)
Gets the number of sample values that have been offered for the specified Statistic. This might be higher than the value returned by getNumAccepted(org.jaitools.numeric.Statistic) due to nulls, Double.NaNs and excluded values in the data stream.

Note that different statistics might have been set at different times in the sampling process.

Parameters:
stat - the statistic
Returns:
number of samples that have been accepted
Throws:
IllegalArgumentException - if the statistic hasn't been set

getNumNaN

public long getNumNaN(Statistic stat)
Gets the number of NaN values that have been offered. Note that different statistics might have been set at different times in the sampling process.

Parameters:
stat - the statistic
Returns:
number of NaN samples offered
Throws:
IllegalArgumentException - if the statistic hasn't been set

getNumNoData

public long getNumNoData(Statistic stat)
Gets the number of NoData values (including NaN) that have been offered. Note that different statistics might have been set at different times in the sampling process.

Parameters:
stat - the statistic
Returns:
number of NoData samples offered
Throws:
IllegalArgumentException - if the statistic hasn't been set

offer

public void offer(Double sample)
Offers a sample value. Offered values are filtered through excluded ranges. Double.NaNs and nulls are excluded by default.

Parameters:
sample - the sample value

offer

public void offer(Double[] samples)
Offers an array of sample values.

Parameters:
samples - the sample values

getStatisticValues

public Map<Statistic,Double> getStatisticValues()
Gets the values of all statistics calculated by this sampler.

Returns:
calculated values


Copyright © 2009-2013. All Rights Reserved.