|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.knime.base.node.mine.decisiontree2.learner.SplitQualityMeasure
org.knime.base.node.mine.decisiontree2.learner.SplitQualityGini
public class SplitQualityGini
Implements the gini index split quality measure. This gini index is subtracted from 1 (worst value), thus the gini index is also better if it is larger than another gini index (same as for gain ratio).
Constructor Summary | |
---|---|
SplitQualityGini()
|
Method Summary | |
---|---|
double |
getWorstValue()
Returns the worst value for this quality measure. |
void |
initQualityMeasure(double[] classFrequencies,
double allOverRecords)
Some quality measures, like the information gain, calculate a quality of a previous distribution compared to a new one. |
boolean |
isBetter(double quality1,
double quality2)
A gini index is better if it is larger than the other one. |
boolean |
isBetterOrEqual(double quality1,
double quality2)
A GINI index is better if it is larger than the other one. |
double |
measureQuality(double allOverRecords,
double[] partitionFrequency,
double[][] partitionClassFrequency,
double numUnknownRecords)
Calculates the gini split index. |
double |
postProcessMeasure(double qualityMeasure,
double allOverRecords,
double[] partitionFrequency,
double numUnknownRecords)
The gini index need not to post process the measure. |
String |
toString()
|
Methods inherited from class org.knime.base.node.mine.decisiontree2.learner.SplitQualityMeasure |
---|
clone |
Methods inherited from class java.lang.Object |
---|
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public SplitQualityGini()
Method Detail |
---|
public boolean isBetter(double quality1, double quality2)
isBetter
in class SplitQualityMeasure
quality1
- first quality to comparequality2
- second quality to compare
public boolean isBetterOrEqual(double quality1, double quality2)
isBetterOrEqual
in class SplitQualityMeasure
quality1
- first quality to comparequality2
- second quality to compare
public double measureQuality(double allOverRecords, double[] partitionFrequency, double[][] partitionClassFrequency, double numUnknownRecords)
For a dataset T the gini index is: gini(T) = 1 - SUM(pj * pj) - for all relative class frequencies pj (pj = Pj/|T|). Pj is the absolut class frequency and nx the number of records in the data set
The gini for the split is: giniSplit(T) = SUM(nx/N*gini(Tx)) - for all relative partition frequencies nx/N and all partitions Tx
measureQuality
in class SplitQualityMeasure
allOverRecords
- the allover number of records with known values in
the partition to split; corresponds to N in the formulapartitionFrequency
- the frequencies of the different patitions;
corresponds to nx in the formulapartitionClassFrequency
- all class frequencies Pj (second
dimension) for all partitions Tx (first dimension *numUnknownRecords
- the number of records with unknown (missing)
value of the relevant attribute; used to weight the quality
measure
public double getWorstValue()
getWorstValue
in class SplitQualityMeasure
public void initQualityMeasure(double[] classFrequencies, double allOverRecords)
initQualityMeasure
in class SplitQualityMeasure
classFrequencies
- the class frequenciesallOverRecords
- the overall countpublic String toString()
toString
in class SplitQualityMeasure
public double postProcessMeasure(double qualityMeasure, double allOverRecords, double[] partitionFrequency, double numUnknownRecords)
postProcessMeasure
in class SplitQualityMeasure
qualityMeasure
- the quality measure to post processallOverRecords
- the allover number of known (non-missing) recordspartitionFrequency
- the frequencies of the potential split
partitionsnumUnknownRecords
- the number of unknown (missing) records
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |