org.knime.base.node.mine.smote
Class Smoter

java.lang.Object
  extended by org.knime.base.node.mine.smote.Smoter

 class Smoter
extends Object

Implementation of the Smote algorithm. It's more a controller for the algorithm, ok.

The algorithm is called SMOTE:

  Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P. (2002) 
  "SMOTE: Synthetic Minority Over-sampling Technique",
  Journal of Artificial Intelligence Research, Volume 16, pages 321-357.
 

Author:
Bernd Wiswedel, University of Konstanz

Constructor Summary
Smoter(BufferedDataTable in, String colName, ExecutionContext exec, Random rand)
          Creates a new instance given the input table in and the target column colName.
 
Method Summary
 void close()
          Closes this controller.
(package private) static DataTableSpec createFinalSpec(DataTableSpec inSpec)
          Creates the out spec when smoting the table with inSpec.
 Iterator<DataCell> getClassValues()
          Get iterator of all classes that occur in the target column.
 int getCount(DataCell name)
          Get frequency of a class name in the input table.
 DataCell getMajorityClass()
          Get name of the majority class, i.e.
 DataTable getSmotedTable()
          Get final output table, including original input table and smoted table.
 void smote(DataCell name, int count, int kNN, ExecutionMonitor exec)
          Oversample the class name such that count new rows are inserted.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Smoter

public Smoter(BufferedDataTable in,
              String colName,
              ExecutionContext exec,
              Random rand)
       throws CanceledExecutionException
Creates a new instance given the input table in and the target column colName.

Parameters:
in - the input table
colName - the target column with class information
exec - monitor to get canceled status from (may be null)
rand - The random generator, may be null.
Throws:
CanceledExecutionException - if execution is canceled
Method Detail

getClassValues

public Iterator<DataCell> getClassValues()
Get iterator of all classes that occur in the target column.

Returns:
all available classes

getCount

public int getCount(DataCell name)
Get frequency of a class name in the input table. The argument must be an entry of the iterator that is returned by getClassValues().

Parameters:
name - the class name
Returns:
the frequency

getMajorityClass

public DataCell getMajorityClass()
Get name of the majority class, i.e. the class that occurs most often.

Returns:
name of majority class.

smote

public void smote(DataCell name,
                  int count,
                  int kNN,
                  ExecutionMonitor exec)
           throws CanceledExecutionException
Oversample the class name such that count new rows are inserted. The kNN nearest neighbors are chosen as reference.

Parameters:
name - the class name
count - add this amount of new rows
kNN - k nearest neighbor parameter
exec - monitor to get canceled status from (may be null)
Throws:
CanceledExecutionException - if execution is canceled

close

public void close()
Closes this controller. The table can be retrieved now by invoking getSmotedTable(). Subsequent calls of smote(DataCell, int, int, ExecutionMonitor) will fail.


getSmotedTable

public DataTable getSmotedTable()
Get final output table, including original input table and smoted table.

Returns:
the new output table

createFinalSpec

static DataTableSpec createFinalSpec(DataTableSpec inSpec)
Creates the out spec when smoting the table with inSpec. It replaces the data types of all DoubleValue-compatible columns by DoubleCell.TYPE.

Parameters:
inSpec - the table spec of the input table
Returns:
the output table spec


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.