org.knime.base.node.mine.decisiontree2.learner
Class SplitQualityMeasure

java.lang.Object
  extended by org.knime.base.node.mine.decisiontree2.learner.SplitQualityMeasure
All Implemented Interfaces:
Cloneable
Direct Known Subclasses:
SplitQualityGainRatio, SplitQualityGini

public abstract class SplitQualityMeasure
extends Object
implements Cloneable

The abstract class for split quality measures like gini or gain ratio.

Author:
Christoph Sieb, University of Konstanz

Constructor Summary
SplitQualityMeasure()
           
 
Method Summary
 Object clone()
          
abstract  double getWorstValue()
          Returns the worst value for this quality measure.
abstract  void initQualityMeasure(double[] classFrequencies, double allOverRecords)
          Some quality measures, like the information gain, calculate a quality of a previous distribution compared to a new one.
abstract  boolean isBetter(double quality1, double quality2)
          Determines if the first passed quality is better compared to the second quality.
abstract  boolean isBetterOrEqual(double quality1, double quality2)
          Determines if the first passed quality is better or equal compared to the second quality.
abstract  double measureQuality(double allOverRecords, double[] partitionFrequency, double[][] partitionClassFrequency, double numUnknownRecords)
          Calculates the quality for a given split.
abstract  double postProcessMeasure(double qualityMeasure, double allOverRecords, double[] partitionFrequency, double numUnknownRecords)
          Some quality measures need normalization when compared to other attributes.
abstract  String toString()
          
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SplitQualityMeasure

public SplitQualityMeasure()
Method Detail

measureQuality

public abstract double measureQuality(double allOverRecords,
                                      double[] partitionFrequency,
                                      double[][] partitionClassFrequency,
                                      double numUnknownRecords)
Calculates the quality for a given split.

Parameters:
allOverRecords - the allover number of records with known values in the partition to split; corresponds to N in the formula
partitionFrequency - the frequencies of the different patitions; corresponds to nx in the formula
partitionClassFrequency - all class frequencies Pj (second dimension) for all partitions Tx (first dimension *
numUnknownRecords - the number of records with unknown (missing) value of the relevant attribute; used to weight the quality measure
Returns:
the quality for a given split

isBetterOrEqual

public abstract boolean isBetterOrEqual(double quality1,
                                        double quality2)
Determines if the first passed quality is better or equal compared to the second quality.

Parameters:
quality1 - first quality to compare
quality2 - second quality to compare
Returns:
true, iff the first quality is better or equal to the second quality

isBetter

public abstract boolean isBetter(double quality1,
                                 double quality2)
Determines if the first passed quality is better compared to the second quality.

Parameters:
quality1 - first quality to compare
quality2 - second quality to compare
Returns:
true, iff the first quality is better to the second quality

getWorstValue

public abstract double getWorstValue()
Returns the worst value for this quality measure.

Returns:
the worst value for this quality measure

initQualityMeasure

public abstract void initQualityMeasure(double[] classFrequencies,
                                        double allOverRecords)
Some quality measures, like the information gain, calculate a quality of a previous distribution compared to a new one. This previous distribution can be reused. For those cases a init method is provided that enable pre calculations to increase performance.

Parameters:
classFrequencies - the class frequencies
allOverRecords - the overall count

toString

public abstract String toString()

Overrides:
toString in class Object

postProcessMeasure

public abstract double postProcessMeasure(double qualityMeasure,
                                          double allOverRecords,
                                          double[] partitionFrequency,
                                          double numUnknownRecords)
Some quality measures need normalization when compared to other attributes. As this normalization is not required when the quality is compared inside a single attribute, this method allows to perform post processing (normalization) of quality measures to avoid a lot of unnecessary calculations.

Parameters:
qualityMeasure - the quality measure to post process
allOverRecords - the allover number of known (non-missing) records
partitionFrequency - the frequencies of the potential split partitions
numUnknownRecords - the number of unknown (missing) records
Returns:
the post processed quality measure

clone

public Object clone()
             throws CloneNotSupportedException

Overrides:
clone in class Object
Throws:
CloneNotSupportedException


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.