org.knime.base.node.mine.decisiontree2.learner
Class SplitQualityGainRatio

java.lang.Object
  extended by org.knime.base.node.mine.decisiontree2.learner.SplitQualityMeasure
      extended by org.knime.base.node.mine.decisiontree2.learner.SplitQualityGainRatio
All Implemented Interfaces:
Cloneable

public class SplitQualityGainRatio
extends SplitQualityMeasure

Implements the gain ratio split quality measure.

Author:
Christoph Sieb, University of Konstanz

Constructor Summary
SplitQualityGainRatio()
           
 
Method Summary
 double getWorstValue()
          Returns the worst value for this quality measure.
 void initQualityMeasure(double[] classFrequencies, double allOverRecords)
          Calculates the entropy of the distribution before a split.
 boolean isBetter(double quality1, double quality2)
          A gain ratio index is better if it is larger than the other one.
 boolean isBetterOrEqual(double quality1, double quality2)
          A gain ratio index is better if it is larger or equal than the other one.
 double measureQuality(double allOverRecords, double[] partitionFrequency, double[][] partitionClassFrequency, double numUnknownRecords)
          Calculates the gain ratio split index.
 double postProcessMeasure(double qualityMeasure, double allOverRecords, double[] partitionFrequency, double numUnknownRecords)
          The post processing of the gain ration measure normalizes the info gain with the split info (see c4.5).
 String toString()
          
 
Methods inherited from class org.knime.base.node.mine.decisiontree2.learner.SplitQualityMeasure
clone
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SplitQualityGainRatio

public SplitQualityGainRatio()
Method Detail

isBetter

public boolean isBetter(double quality1,
                        double quality2)
A gain ratio index is better if it is larger than the other one. Determines if the first passed quality is better compared to the second quality.

Specified by:
isBetter in class SplitQualityMeasure
Parameters:
quality1 - first quality to compare
quality2 - second quality to compare
Returns:
true, iff the first quality is better to the second quality

isBetterOrEqual

public boolean isBetterOrEqual(double quality1,
                               double quality2)
A gain ratio index is better if it is larger or equal than the other one. Determines if the first passed quality is better or equal compared to the second quality.

Specified by:
isBetterOrEqual in class SplitQualityMeasure
Parameters:
quality1 - first quality to compare
quality2 - second quality to compare
Returns:
true, iff the first quality is better or equal to the second quality

measureQuality

public double measureQuality(double allOverRecords,
                             double[] partitionFrequency,
                             double[][] partitionClassFrequency,
                             double numUnknownRecords)
Calculates the gain ratio split index.

For a dataset T the gain ratio index is:

gainRatio(T) = gain(T) / splitInfo(T)

Specified by:
measureQuality in class SplitQualityMeasure
Parameters:
allOverRecords - the allover number of records with known values in the partition to split; corresponds to N in the formula
partitionFrequency - the frequencies of the different patitions; corresponds to nx in the formula
partitionClassFrequency - all class frequencies Pj (second dimension) for all partitions Tx (first dimension
numUnknownRecords - the number of records with unknown (missing) value of the relevant attribute; used to weight the quality measure
Returns:
the gain ratio split index

getWorstValue

public double getWorstValue()
Returns the worst value for this quality measure.

Specified by:
getWorstValue in class SplitQualityMeasure
Returns:
the worst value for this quality measure

initQualityMeasure

public void initQualityMeasure(double[] classFrequencies,
                               double allOverRecords)
Calculates the entropy of the distribution before a split. Therefore the entropy can be reused for several calculations. Some quality measures, like the information gain, calculate a quality of a previous distribution compared to a new one. This previous distribution can be reused. For those cases a init method is provided that enable pre calculations to increase performance.

Specified by:
initQualityMeasure in class SplitQualityMeasure
Parameters:
classFrequencies - the class frequencies
allOverRecords - the overall count

toString

public String toString()

Specified by:
toString in class SplitQualityMeasure

postProcessMeasure

public double postProcessMeasure(double qualityMeasure,
                                 double allOverRecords,
                                 double[] partitionFrequency,
                                 double numUnknownRecords)
The post processing of the gain ration measure normalizes the info gain with the split info (see c4.5). Some quality measures need normalization when compared to other attributes. As this normalization is not required when the quality is compared inside a single attribute, this method allows to perform post processing (normalization) of quality measures to avoid a lot of unnecessary calculations.

Specified by:
postProcessMeasure in class SplitQualityMeasure
Parameters:
qualityMeasure - the quality measure to post process
allOverRecords - the allover number of known (non-missing) records
partitionFrequency - the frequencies of the potential split partitions
numUnknownRecords - the number of unknown (missing) records
Returns:
the post processed quality measure


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.