org.knime.base.node.mine.scorer.entrop
Class EntropyCalculator

java.lang.Object
  extended by org.knime.base.node.mine.scorer.entrop.EntropyCalculator

public final class EntropyCalculator
extends Object

Utility class that allows to calculate some entropy and quality values for clustering results given a reference clustering.

Author:
Bernd Wiswedel, University of Konstanz

Constructor Summary
EntropyCalculator(DataTable reference, DataTable clustering, int referenceCol, int clusteringCol, ExecutionMonitor exec)
          Creates new instance.
EntropyCalculator(Map<RowKey,RowKey> referenceMap, Map<RowKey,Set<RowKey>> clusteringMap)
          Creates new instance given the maps of clustering and reference.
 
Method Summary
static double entropy(Map<RowKey,RowKey> ref, Set<RowKey> pats)
          Get entropy for one single cluster.
 Map<RowKey,Set<RowKey>> getClusteringMap()
          Map of Cluster name -> cluster members (in a set) as given in the clustering to score.
 double getEntropy()
           
static double getEntropy(Map<RowKey,RowKey> reference, Map<RowKey,Set<RowKey>> clusterMap)
          Get entropy according to reference clustering, the entropy value is not normalized, i.e.
 int getNrClusters()
           
 int getNrReference()
           
 int getPatternsInClusters()
           
 int getPatternsInReference()
           
 double getQuality()
           
static double getQuality(Map<RowKey,RowKey> reference, Map<RowKey,Set<RowKey>> clusterMap)
          Get quality measure of current cluster result (in 0-1).
 DataTable getScoreTable()
           
static DataTableSpec getScoreTableSpec()
           
static EntropyCalculator load(File dir, ExecutionMonitor exec)
          Factory method to restore this object given a directory in which the content is saved.
 void save(File dir, ExecutionMonitor exec)
          Saves the structure of this objec to the target directory.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

EntropyCalculator

public EntropyCalculator(DataTable reference,
                         DataTable clustering,
                         int referenceCol,
                         int clusteringCol,
                         ExecutionMonitor exec)
                  throws CanceledExecutionException
Creates new instance.

Parameters:
reference - the reference table, i.e. the clusters that should be found
clustering - the table containing the clustering to judge
referenceCol - the column index in reference that contains the cluster membership
clusteringCol - the column index in clustering that contains the cluster membership
exec - the execution monitor for canceling and progress
Throws:
CanceledExecutionException - if canceled

EntropyCalculator

public EntropyCalculator(Map<RowKey,RowKey> referenceMap,
                         Map<RowKey,Set<RowKey>> clusteringMap)
Creates new instance given the maps of clustering and reference.

Parameters:
referenceMap - the reference clustering, mapping ID -> cluster name
clusteringMap - the clustering to score, cluster name -> cluster members in a set (may not necessarily be unique)
Method Detail

getEntropy

public double getEntropy()
Returns:
the entropy

getQuality

public double getQuality()
Returns:
the quality

getNrClusters

public int getNrClusters()
Returns:
the nrClusters

getNrReference

public int getNrReference()
Returns:
the nrReference

getPatternsInClusters

public int getPatternsInClusters()
Returns:
the patternsInClusters

getPatternsInReference

public int getPatternsInReference()
Returns:
the patternsInReference

getScoreTable

public DataTable getScoreTable()
Returns:
the scoreTable

getScoreTableSpec

public static DataTableSpec getScoreTableSpec()
Returns:
Table spec to getScoreTable().

getClusteringMap

public Map<RowKey,Set<RowKey>> getClusteringMap()
Map of Cluster name -> cluster members (in a set) as given in the clustering to score.

Returns:
the clusteringMap

save

public void save(File dir,
                 ExecutionMonitor exec)
          throws IOException,
                 CanceledExecutionException
Saves the structure of this objec to the target directory.

Parameters:
dir - to save to
exec - for progress/cancel
Throws:
IOException - if that fails
CanceledExecutionException - if canceled

load

public static EntropyCalculator load(File dir,
                                     ExecutionMonitor exec)
                              throws IOException,
                                     InvalidSettingsException,
                                     CanceledExecutionException
Factory method to restore this object given a directory in which the content is saved.

Parameters:
dir - the dir to read from
exec - for cancellation.
Returns:
a new object as read from dir
Throws:
IOException - if that fails
InvalidSettingsException - if the internals don't match
CanceledExecutionException - if canceled

getQuality

public static double getQuality(Map<RowKey,RowKey> reference,
                                Map<RowKey,Set<RowKey>> clusterMap)
Get quality measure of current cluster result (in 0-1). The quality value is defined as

sum over all clusters (curren_cluster_size / patterns_count * (1 - entropy (current_cluster wrt. reference).

For further details see Bernd Wiswedel, Michael R. Berthold, Fuzzy Clustering in Parallel Universes, International Journal of Approximate Reasoning, 2006.

Parameters:
reference - the reference clustering, maps patterns to cluster ID. The reference map is supposed to contain all data (if there are noise objects, that should be contained and have an own). The quality value is normalized over the size of this set.
clusterMap - the map containing the clusters that have been found, i.e. clusterID (as above) as key and the set of all contained patterns as value
Returns:
quality value in [0,1]

getEntropy

public static double getEntropy(Map<RowKey,RowKey> reference,
                                Map<RowKey,Set<RowKey>> clusterMap)
Get entropy according to reference clustering, the entropy value is not normalized, i.e. the result is in the range of [0, log2(|cluster|).

Parameters:
reference - the reference clustering to compare to
clusterMap - the clustering to judge
Returns:
entropy value

entropy

public static double entropy(Map<RowKey,RowKey> ref,
                             Set<RowKey> pats)
Get entropy for one single cluster.

Parameters:
ref - the reference clustering
pats - the single cluster to score
Returns:
the (not-normalized) entropy of pats wrt. ref


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.