|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.knime.base.node.mine.scorer.entrop.EntropyCalculator
public final class EntropyCalculator
Utility class that allows to calculate some entropy and quality values for clustering results given a reference clustering.
Constructor Summary | |
---|---|
EntropyCalculator(DataTable reference,
DataTable clustering,
int referenceCol,
int clusteringCol,
ExecutionMonitor exec)
Creates new instance. |
|
EntropyCalculator(Map<RowKey,RowKey> referenceMap,
Map<RowKey,Set<RowKey>> clusteringMap)
Creates new instance given the maps of clustering and reference. |
Method Summary | |
---|---|
static double |
entropy(Map<RowKey,RowKey> ref,
Set<RowKey> pats)
Get entropy for one single cluster. |
Map<RowKey,Set<RowKey>> |
getClusteringMap()
Map of Cluster name -> cluster members (in a set) as given in the clustering to score. |
double |
getEntropy()
|
static double |
getEntropy(Map<RowKey,RowKey> reference,
Map<RowKey,Set<RowKey>> clusterMap)
Get entropy according to reference clustering, the entropy value is not normalized, i.e. |
int |
getNrClusters()
|
int |
getNrReference()
|
int |
getPatternsInClusters()
|
int |
getPatternsInReference()
|
double |
getQuality()
|
static double |
getQuality(Map<RowKey,RowKey> reference,
Map<RowKey,Set<RowKey>> clusterMap)
Get quality measure of current cluster result (in 0-1). |
DataTable |
getScoreTable()
|
static DataTableSpec |
getScoreTableSpec()
|
static EntropyCalculator |
load(File dir,
ExecutionMonitor exec)
Factory method to restore this object given a directory in which the content is saved. |
void |
save(File dir,
ExecutionMonitor exec)
Saves the structure of this objec to the target directory. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public EntropyCalculator(DataTable reference, DataTable clustering, int referenceCol, int clusteringCol, ExecutionMonitor exec) throws CanceledExecutionException
reference
- the reference table, i.e. the clusters that should be
foundclustering
- the table containing the clustering to judgereferenceCol
- the column index in reference
that
contains the cluster membershipclusteringCol
- the column index in clustering
that
contains the cluster membershipexec
- the execution monitor for canceling and progress
CanceledExecutionException
- if canceledpublic EntropyCalculator(Map<RowKey,RowKey> referenceMap, Map<RowKey,Set<RowKey>> clusteringMap)
referenceMap
- the reference clustering, mapping ID -> cluster
nameclusteringMap
- the clustering to score, cluster name -> cluster
members in a set (may not necessarily be unique)Method Detail |
---|
public double getEntropy()
public double getQuality()
public int getNrClusters()
public int getNrReference()
public int getPatternsInClusters()
public int getPatternsInReference()
public DataTable getScoreTable()
public static DataTableSpec getScoreTableSpec()
getScoreTable()
.public Map<RowKey,Set<RowKey>> getClusteringMap()
public void save(File dir, ExecutionMonitor exec) throws IOException, CanceledExecutionException
dir
- to save toexec
- for progress/cancel
IOException
- if that fails
CanceledExecutionException
- if canceledpublic static EntropyCalculator load(File dir, ExecutionMonitor exec) throws IOException, InvalidSettingsException, CanceledExecutionException
dir
- the dir to read fromexec
- for cancellation.
IOException
- if that fails
InvalidSettingsException
- if the internals don't match
CanceledExecutionException
- if canceledpublic static double getQuality(Map<RowKey,RowKey> reference, Map<RowKey,Set<RowKey>> clusterMap)
sum over all clusters (curren_cluster_size / patterns_count * (1 - entropy (current_cluster wrt. reference).
For further details see Bernd Wiswedel, Michael R. Berthold, Fuzzy Clustering in Parallel Universes, International Journal of Approximate Reasoning, 2006.
reference
- the reference clustering, maps patterns to cluster ID.
The reference map is supposed to contain all data (if there
are noise objects, that should be contained and have an own).
The quality value is normalized over the size of this set.clusterMap
- the map containing the clusters that have been found,
i.e. clusterID (as above) as key and the set of all contained
patterns as value
public static double getEntropy(Map<RowKey,RowKey> reference, Map<RowKey,Set<RowKey>> clusterMap)
[0, log2(|cluster|)
.
reference
- the reference clustering to compare toclusterMap
- the clustering to judge
public static double entropy(Map<RowKey,RowKey> ref, Set<RowKey> pats)
ref
- the reference clusteringpats
- the single cluster to score
pats
wrt.
ref
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |