org.knime.base.data.statistics
Class Statistics2Table

java.lang.Object
  extended by org.knime.base.data.statistics.Statistics2Table

public class Statistics2Table
extends Object

New statistic table utility class to compute statistical moments, such as mean, variance, column sum, count missing values, min/max values, median, and count occurrences of all possible values.

Author:
Thomas Gabriel, University of Konstanz

Constructor Summary
Statistics2Table(BufferedDataTable table, boolean computeMedian, int numNomValuesOutput, List<String> nominalValueColumns, ExecutionContext exec)
          Create new statistic table from an existing one.
 
Method Summary
 DataTable createNominalValueTable(List<String> nominal)
          Create nominal value table containing all possible values together with their occurrences.
static DataTableSpec createOutSpecNominal(DataTableSpec inSpec, List<String> nominalValues)
          Create spec containing only nominal columns in same order as the input spec.
static DataTableSpec createOutSpecNumeric(DataTableSpec inSpec)
          Create spec containing only numeric columns in same order as the input spec.
 DataTable createStatisticMomentsTable()
          Creates a table of statistic moments such as minimum, maximum, mean, standard deviation, variance, overall sum, no.
 String[] extractNominalColumns(List<String> nominalValues)
          Returns an array of valid columns.
 String[] getColumnNames()
           
 double[] getMax()
          Returns the maximum for all columns.
 double[] getMean()
          Returns the means for all columns.
 double getMean(int colIdx)
          Returns the mean for the desired column.
 double[] getMedian()
          Returns the median for all columns.
 double getMedian(int colIdx)
          Returns the median for the desired column.
 double[] getMin()
          Returns the minimum for all columns.
 Map<DataCell,Integer>[] getNominalValues()
          Returns an array (for each column) of mappings containing DataCell value to number of occurrences.
 Map<DataCell,Integer> getNominalValues(int colIdx)
          Returns a map containing DataCell value to number of occurrences.
 double[] getNumberMissingValues()
          Returns an array of the number of missing values for each dimension.
 double getNumberMissingValues(int colIdx)
          Returns the number of missing values for the given column index.
 double[] getStandardDeviation()
          Returns the standard deviation for all columns.
 double getStandardDeviation(int colIdx)
          Calculates the standard deviation for the desired column.
 double[] getSum()
          Returns the sum values for all columns.
 double getSum(int colIdx)
          Returns the sum for the desired column.
 double[] getVariance()
          Returns the variance for all columns.
 double getVariance(int colIdx)
          Returns the variance for the desired column.
 String getWarning()
          Returns warning message if number of possible values exceeds predefined maximum.
static Statistics2Table load(NodeSettingsRO sett)
          Load a new statistic table by the given settings object.
 void save(NodeSettingsWO sett)
          Saves this object to the given settings object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Statistics2Table

public Statistics2Table(BufferedDataTable table,
                        boolean computeMedian,
                        int numNomValuesOutput,
                        List<String> nominalValueColumns,
                        ExecutionContext exec)
                 throws CanceledExecutionException
Create new statistic table from an existing one. This constructor calculates all values. It needs to traverse (twice) through the entire specified table. User can cancel action if an execution monitor is passed.

Parameters:
table - table to be wrapped
computeMedian - if the median has to be computed
numNomValuesOutput - number of possible values in output table
nominalValueColumns - columns used to determine all poss. values
exec - an object to check with if user canceled operation
Throws:
CanceledExecutionException - if user canceled
Method Detail

createStatisticMomentsTable

public DataTable createStatisticMomentsTable()
Creates a table of statistic moments such as minimum, maximum, mean, standard deviation, variance, overall sum, no. of missing vales, and median.

Returns:
a table with one moment in each row across all input columns

createNominalValueTable

public DataTable createNominalValueTable(List<String> nominal)
Create nominal value table containing all possible values together with their occurrences.

Returns:
nominal value output table

createOutSpecNumeric

public static DataTableSpec createOutSpecNumeric(DataTableSpec inSpec)
Create spec containing only numeric columns in same order as the input spec.

Parameters:
inSpec - input spec
Returns:
a new spec with all numeric columns

createOutSpecNominal

public static DataTableSpec createOutSpecNominal(DataTableSpec inSpec,
                                                 List<String> nominalValues)
Create spec containing only nominal columns in same order as the input spec.

Parameters:
inSpec - input spec
nominalValues - used in map of co-occurrences
Returns:
a new spec with all nominal columns

extractNominalColumns

public final String[] extractNominalColumns(List<String> nominalValues)
Returns an array of valid columns.

Returns:
an array of string column which are valid in in conjunction with the current data spec

getColumnNames

public String[] getColumnNames()
Returns:
array of column names

getMean

public double getMean(int colIdx)
Returns the mean for the desired column. Throws an exception if the specified column is not compatible to DoubleValue. Returns Double.NaN if the specified column contains only missing cells or if the table is empty.

Parameters:
colIdx - the column index for which the mean is calculated
Returns:
mean value or Double.NaN

getMean

public double[] getMean()
Returns the means for all columns. Returns Double.NaN if the column type is not of type DoubleValue.

Returns:
an array of mean values with an item for each column, which is Double.NaN if the column type is not DoubleValue

getSum

public double getSum(int colIdx)
Returns the sum for the desired column. Throws an exception if the specified column is not compatible to DoubleValue. Returns Double.NaN if the specified column contains only missing cells or if the table is empty.

Parameters:
colIdx - the column index for which the mean is calculated
Returns:
sum value or Double.NaN

getSum

public double[] getSum()
Returns the sum values for all columns. Returns Double.NaN if the column type is not of type DoubleValue.

Returns:
an array of sum values with an item for each column, which is Double.NaN if the column type is not DoubleValue

getNumberMissingValues

public double[] getNumberMissingValues()
Returns an array of the number of missing values for each dimension.

Returns:
number missing values for each dimensions

getNumberMissingValues

public double getNumberMissingValues(int colIdx)
Returns the number of missing values for the given column index.

Parameters:
colIdx - column index to consider
Returns:
number of missing values in this columns

getVariance

public double getVariance(int colIdx)
Returns the variance for the desired column. Throws an exception if the specified column is not compatible to DoubleValue. Returns Double.NaN if the specified column contains only missing cells or if the table is empty.

Parameters:
colIdx - the column index for which the variance is calculated
Returns:
variance or Double.NaN

getVariance

public double[] getVariance()
Returns the variance for all columns. Returns Double.NaN if the column type is not of type DoubleValue, if the entire column contains missing cells, or if the table is empty.

Returns:
variance values

getStandardDeviation

public double getStandardDeviation(int colIdx)
Calculates the standard deviation for the desired column. Throws an exception if the column type is not compatible to DoubleValue. Will return zero if the column contains only missing cells or the table was empty.

Parameters:
colIdx - the index of the column for which the standard deviation is to be calculated
Returns:
standard deviation or zero if its a column of missing values of the table is empty

getStandardDeviation

public double[] getStandardDeviation()
Returns the standard deviation for all columns. The returned array contains no valid value (i.e. Double.NaN) for column that are not compatible to DoubleValue.

Returns:
standard deviation values

getMin

public double[] getMin()
Returns the minimum for all columns. Will be Double.NaN for columns that only contain missing cells or for empty data tables.

Returns:
the minimum values

getMax

public double[] getMax()
Returns the maximum for all columns. Will be Double.NaN for columns that only contain missing cells or for empty data tables.

Returns:
the maximum values

getMedian

public double getMedian(int colIdx)
Returns the median for the desired column.

Parameters:
colIdx - the column index for which the median is calculated
Returns:
median value

getMedian

public double[] getMedian()
Returns the median for all columns.

Returns:
an array of median values with an item for each column

getNominalValues

public Map<DataCell,Integer> getNominalValues(int colIdx)
Returns a map containing DataCell value to number of occurrences.

Parameters:
colIdx - column index to return map for
Returns:
map of DataCell values to occurrences

getNominalValues

public Map<DataCell,Integer>[] getNominalValues()
Returns an array (for each column) of mappings containing DataCell value to number of occurrences.

Returns:
array of mappings of occurrences

getWarning

public String getWarning()
Returns warning message if number of possible values exceeds predefined maximum.

Returns:
null or a warning issued during construction time

load

public static Statistics2Table load(NodeSettingsRO sett)
                             throws InvalidSettingsException
Load a new statistic table by the given settings object.

Parameters:
sett - to load this table from
Returns:
a new statistic table
Throws:
InvalidSettingsException - if the settings are corrupt

save

public void save(NodeSettingsWO sett)
Saves this object to the given settings object.

Parameters:
sett - this object is saved to


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.