org.knime.base.node.mine.decisiontree2.learner
Class InMemoryTable

java.lang.Object
  extended by org.knime.base.node.mine.decisiontree2.learner.InMemoryTable
All Implemented Interfaces:
Iterable<DataRowWeighted>

public class InMemoryTable
extends Object
implements Iterable<DataRowWeighted>

Implements a table that holds DataRowWeighteds in memory. Additionally, this class maintains distribution information about the class values and possible values of nominal attributes.

Author:
Christoph Sieb, University of Konstanz

Constructor Summary
InMemoryTable(InMemoryTable tableTemplate)
          Creates an empty table from a given template table.
InMemoryTable(ValueMapper<DataCell>[] nominalAttributeValueMapper, ValueMapper<DataCell> classValueMapper, ValueMapper<String> attributeNameMapper, double minNumberRowsPerNode)
          Creates an empty table that keeps all rows in memory.
 
Method Summary
 void addRow(DataRowWeighted row)
          Adds a DataRowWeighted.
 boolean considerAttribute(int attributeIndex)
          Returns true if the given attribute should be considered during learning, false if not.
 void freeUnderlyingDataRows()
          Frees the underlying data rows.
 String getAttributeName(int index)
          Returns the name of the attribute specified by the given index.
 LinkedHashMap<DataCell,Double> getClassFrequencies()
          Returns the class frequencies as a LinkedHashMap mapping class values (DataCell) to the frequency as doubles.
 double[] getClassFrequencyArray()
          Returns the class frequency array representing the class distribution of this table.
 ValueMapper<DataCell> getClassValueMapper()
          Returns the class value mapper of this table.
 double[] getCopyOfClassFrequencyArray()
          Returns a copy of the class frequency array representing the class distribution of this table.
 int getMajorityClass()
          Returns the mapping value of the majority class.
 DataCell getMajorityClassAsCell()
          Returns the majority class value as DataCell.
 double getMajorityClassCount()
          Returns the frequency of the majoriy class.
 ValueMapper<DataCell> getNominalAttributeValueMapper(int attributeIndex)
          Returns the attribute value mapper of this table for the given attribute.
 NominalValueHistogram getNominalValueHistogram(int attributeIndex)
          Returns the value histogram for the given attribute index.
 DataCell[] getNominalValuesInMappingOrder(int attributeIndex)
          Returns the nominal values for the given attribute index.
 int getNumAttributes()
          Returns the number of attributes (excluding the class attribute).
 int getNumberDataRows()
          Returns the size of this table.
 int getNumNominalValues(int attributeIndex)
          Returns the number of nominal values for the given attribute.
 double getSumOfWeights()
          Returns the sum of the weights of all rows.
 boolean isNominal(int index)
          Whether the attribute at the given index position is nominal or not.
 boolean isPureEnough()
          Determines if the data distribution (class value distribution) is pure enough.
 Iterator<DataRowWeighted> iterator()
          
 void pack()
          Sets the size of the underlying array to the number of elements in the list.
 void setConsiderAttribute(int attributeIndex, boolean consider)
          To set if an attribute should be considered during learning or not.
 double[] sortDataRows(int attributeIndex)
          Sorts the data rows of this table in ascending order on the given attribute index.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InMemoryTable

public InMemoryTable(ValueMapper<DataCell>[] nominalAttributeValueMapper,
                     ValueMapper<DataCell> classValueMapper,
                     ValueMapper<String> attributeNameMapper,
                     double minNumberRowsPerNode)
Creates an empty table that keeps all rows in memory. The ValueMapper array must contain mappers only at array positions where a nominal attribute is available. Numeric attributes do not need a mapper. These array positions must contain null.

Parameters:
nominalAttributeValueMapper - the value mapper for the nominal attributes; the array must only contain mappers at positions where nominal values are available
classValueMapper - the value mapper for the class attribute values that are also stored in a integer mapped manner
attributeNameMapper - the value mapper for the attribute names
minNumberRowsPerNode - the minimum number of nodes per leaf; used to determine whether this tables distribution of class values is pure enough

InMemoryTable

public InMemoryTable(InMemoryTable tableTemplate)
Creates an empty table from a given template table. The new empty table receives the mappers, the remarks whether to consider certain attributes, the minimum number rows per tree node and the nominal attribute indices array.

Parameters:
tableTemplate - the table that is used as a template to create this new table
Method Detail

considerAttribute

public boolean considerAttribute(int attributeIndex)
Returns true if the given attribute should be considered during learning, false if not.

Parameters:
attributeIndex - the index of the attribute to get the considering information for
Returns:
true if the given attribute should be considered during learning, false if not

setConsiderAttribute

public void setConsiderAttribute(int attributeIndex,
                                 boolean consider)
To set if an attribute should be considered during learning or not. NOTE: this is just a hint for the algorithm (i.e. just a flag).

Parameters:
attributeIndex - the index of the attribute to set the considering information for
consider - true - the attribute should be considered during learning, false - the attribute should not be considered

getAttributeName

public String getAttributeName(int index)
Returns the name of the attribute specified by the given index.

Parameters:
index - the index of the attribute to get the name for
Returns:
the name of the specified attribute

isNominal

public boolean isNominal(int index)
Whether the attribute at the given index position is nominal or not.

Parameters:
index - the attribute index position
Returns:
true if the attribute at the given index position is nominal, false otherwise (i.e. the attribute is numeric)

freeUnderlyingDataRows

public void freeUnderlyingDataRows()
Frees the underlying data rows. Can be used to reduce the memory requirements in case the data itself is not needed any more.


iterator

public Iterator<DataRowWeighted> iterator()

Specified by:
iterator in interface Iterable<DataRowWeighted>

addRow

public void addRow(DataRowWeighted row)
Adds a DataRowWeighted.

Parameters:
row - the row to add

getMajorityClassCount

public double getMajorityClassCount()
Returns the frequency of the majoriy class.

Returns:
the frequency of the majoriy class.

getMajorityClass

public int getMajorityClass()
Returns the mapping value of the majority class.

Returns:
the mapping value of the majority class

getMajorityClassAsCell

public DataCell getMajorityClassAsCell()
Returns the majority class value as DataCell.

Returns:
the majority class value as DataCell

isPureEnough

public boolean isPureEnough()
Determines if the data distribution (class value distribution) is pure enough. The table is pure enough, if there are only rows of one class value, or if the number of rows (sum of weights) is below twice the threashold specified in the constructor.

Returns:
true, if the table is pure enough, false otherwise

getClassFrequencyArray

public double[] getClassFrequencyArray()
Returns the class frequency array representing the class distribution of this table.

Returns:
the class frequency array representing the class distribution of this table

getCopyOfClassFrequencyArray

public double[] getCopyOfClassFrequencyArray()
Returns a copy of the class frequency array representing the class distribution of this table. This is important if the returned array is inteded to be manipulated!

Returns:
a copy of the class frequency array representing the class distribution of this table

getClassFrequencies

public LinkedHashMap<DataCell,Double> getClassFrequencies()
Returns the class frequencies as a LinkedHashMap mapping class values (DataCell) to the frequency as doubles.

Returns:
the class frequencies as a LinkedHashMap mapping class values (DataCell) to the frequency as doubles

getNumberDataRows

public int getNumberDataRows()
Returns the size of this table.

Returns:
the size of this table

getClassValueMapper

public ValueMapper<DataCell> getClassValueMapper()
Returns the class value mapper of this table.

Returns:
the class value mapper of this table

getNominalAttributeValueMapper

public ValueMapper<DataCell> getNominalAttributeValueMapper(int attributeIndex)
Returns the attribute value mapper of this table for the given attribute.

Parameters:
attributeIndex - the index for which to return the value mapper
Returns:
the attribute value mapper of this table for the given nominal attribute, null if the attribute is not nominal (i.e. numeric)

getNumAttributes

public int getNumAttributes()
Returns the number of attributes (excluding the class attribute).

Returns:
the number of attributes (excluding the class attribute)

pack

public void pack()
Sets the size of the underlying array to the number of elements in the list.


getSumOfWeights

public double getSumOfWeights()
Returns the sum of the weights of all rows.

Returns:
the sum of the weights of all rows

getNumNominalValues

public int getNumNominalValues(int attributeIndex)
Returns the number of nominal values for the given attribute.

Parameters:
attributeIndex - the nominal attribute index for which to get the number of nominal values
Returns:
the number of nominal values for the given attribute; -1 if the attribute is not nominal

getNominalValueHistogram

public NominalValueHistogram getNominalValueHistogram(int attributeIndex)
Returns the value histogram for the given attribute index. If the attribute is numeric, null is returned.

Parameters:
attributeIndex - the attribute index for which to return the histogram
Returns:
the value histogram for the given attribute index; if the attribute is numeric, null is returned

getNominalValuesInMappingOrder

public DataCell[] getNominalValuesInMappingOrder(int attributeIndex)
Returns the nominal values for the given attribute index. The value array is ordered according to the integer mapping, i.e. the DataCell mapped with integer 0 is placed first, and so on.

Parameters:
attributeIndex - the attribute index for which to return the nominal values; null if the attribute is not nomnial
Returns:
the nominal values for the given attribute index. The value array is ordered according to the integer mapping

sortDataRows

public double[] sortDataRows(int attributeIndex)
Sorts the data rows of this table in ascending order on the given attribute index. The missing values are put at the end of the table.

Parameters:
attributeIndex - the index of the attribute on which to sort the data rows
Returns:
the sum of weights of the missing value rows for each class value; corresponds to the class frequency array but only for the missing values


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.