org.knime.base.node.mine.decisiontree2.learner
Class DecisionTreeLearnerNodeModel

java.lang.Object
  extended by org.knime.core.node.NodeModel
      extended by org.knime.base.node.mine.decisiontree2.learner.DecisionTreeLearnerNodeModel

public class DecisionTreeLearnerNodeModel
extends NodeModel

Implements a decision tree induction algorithm based on C4.5 and SPRINT.

Author:
Christoph Sieb, University of Konstanz
See Also:
DecisionTreeLearnerNodeFactory

Field Summary
static int DATA_INPORT
          Index of input data port.
static boolean DEFAULT_BINARY_NOMINAL_SPLIT_MODE
          The default binary split mode (off).
static int DEFAULT_MAX_BIN_NOMINAL_SPLIT_COMPUTATION
          The default for the maximum number of nominal values for which all subsets are calculated (results in the optimal binary split); this parameter is only use if binaryNominalSplits is true; if the number of nominal values is higher, a heuristic is applied.
static boolean DEFAULT_MEMORY_OPTION
          The default build option (memory or on disk).
static int DEFAULT_MIN_NUM_RECORDS_PER_NODE
          The minimum number records expected per node.
static int DEFAULT_NUM_PROCESSORS
          The default number of records stored for the view.
static int DEFAULT_NUMBER_RECORDS_FOR_VIEW
          The default number of records stored for the view.
static double DEFAULT_PRUNING_CONFIDENCE_THRESHOLD
          The default confidence threshold for pruning.
static String DEFAULT_PRUNING_METHOD
          The default pruning method.
static boolean DEFAULT_SPLIT_AVERAGE
          The default whether to use the average as the split point is false.
static String DEFAULT_SPLIT_QUALITY_MEASURE
          The default split quality measure.
static String KEY_BINARY_NOMINAL_SPLIT_MODE
          Key to store whether to use the binary nominal split mode.
static String KEY_CLASSIFYCOLUMN
          Key to store the classification column in the settings.
static String KEY_MAX_NUM_NOMINAL_VALUES
          Key to store the max number of nominal values for which to compute all subsets.
static String KEY_MEMORY_OPTION
          Key to store the memory option (memory build or on disk).
static String KEY_MIN_NUMBER_RECORDS_PER_NODE
          Key to store the minimum number of records per node.
static String KEY_NUM_PROCESSORS
          Key to store the number of processors to use.
static String KEY_NUMBER_VIEW_RECORDS
          Key to store the number of records stored for the view.
static String KEY_PRUNING_CONFIDENCE_THRESHOLD
          Key to store the confidence threshold for tree pruning in the settings.
static String KEY_PRUNING_METHOD
          Key to store the confidence threshold for tree pruning in the settings.
static String KEY_SPLIT_AVERAGE
          Key to store the split average in the settings.
static String KEY_SPLIT_QUALITY_MEASURE
          Key to store the split quality measure in the settings.
static int MAX_NUM_PROCESSORS
          The default number of records stored for the view.
static int MODEL_OUTPORT
          Index of model out port.
static String PRUNING_ESTIMATED_ERROR
          The constant for estimated error pruning.
static String PRUNING_MDL
          The constant for mdl pruning.
static String PRUNING_NO
          The constant for estimated error pruning.
static String SPLIT_QUALITY_GAIN_RATIO
          The constant for the gain ratio split quality measure.
static String SPLIT_QUALITY_GINI
          The constant for the gini index split quality measure.
 
Constructor Summary
DecisionTreeLearnerNodeModel()
          Inits a new Decision Tree model with one data in- and one model output port.
 
Method Summary
(package private) static void checkMemory()
          Checks the memory footprint.
protected  PortObjectSpec[] configure(PortObjectSpec[] inSpecs)
          The number of the class column must be > 0 and < number of input columns.
static boolean criticalMemoryFootprint()
          Returns whether the memory footprint is critical.
protected  PortObject[] execute(PortObject[] data, ExecutionContext exec)
          Start of decision tree induction.
 DecisionTree getDecisionTree()
          Returns the decision tree model.
protected  void loadInternals(File nodeInternDir, ExecutionMonitor exec)
          Load internals into the derived NodeModel.
protected  void loadValidatedSettingsFrom(NodeSettingsRO settings)
          Loads the class column and the classification value in the model.
protected  void reset()
          Resets all internal data.
protected  void saveInternals(File nodeInternDir, ExecutionMonitor exec)
          Save internals of the derived NodeModel.
protected  void saveSettingsTo(NodeSettingsWO settings)
          Saves the class column and the classification value in the settings.
protected  void validateSettings(NodeSettingsRO settings)
          This method validates the settings.
 
Methods inherited from class org.knime.core.node.NodeModel
addWarningListener, configure, continueLoop, execute, executeModel, getInHiLiteHandler, getLoopEndNode, getLoopStartNode, getNrInPorts, getNrOutPorts, getOutHiLiteHandler, getWarningMessage, notifyViews, notifyWarningListeners, peekFlowVariableDouble, peekFlowVariableInt, peekFlowVariableString, peekScopeVariableDouble, peekScopeVariableInt, peekScopeVariableString, pushFlowVariableDouble, pushFlowVariableInt, pushFlowVariableString, pushScopeVariableDouble, pushScopeVariableInt, pushScopeVariableString, removeWarningListener, setInHiLiteHandler, setWarningMessage, stateChanged
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

KEY_CLASSIFYCOLUMN

public static final String KEY_CLASSIFYCOLUMN
Key to store the classification column in the settings.

See Also:
Constant Field Values

KEY_PRUNING_CONFIDENCE_THRESHOLD

public static final String KEY_PRUNING_CONFIDENCE_THRESHOLD
Key to store the confidence threshold for tree pruning in the settings.

See Also:
Constant Field Values

KEY_PRUNING_METHOD

public static final String KEY_PRUNING_METHOD
Key to store the confidence threshold for tree pruning in the settings.

See Also:
Constant Field Values

KEY_SPLIT_QUALITY_MEASURE

public static final String KEY_SPLIT_QUALITY_MEASURE
Key to store the split quality measure in the settings.

See Also:
Constant Field Values

KEY_MEMORY_OPTION

public static final String KEY_MEMORY_OPTION
Key to store the memory option (memory build or on disk).

See Also:
Constant Field Values

KEY_SPLIT_AVERAGE

public static final String KEY_SPLIT_AVERAGE
Key to store the split average in the settings.

See Also:
Constant Field Values

KEY_NUMBER_VIEW_RECORDS

public static final String KEY_NUMBER_VIEW_RECORDS
Key to store the number of records stored for the view.

See Also:
Constant Field Values

KEY_MIN_NUMBER_RECORDS_PER_NODE

public static final String KEY_MIN_NUMBER_RECORDS_PER_NODE
Key to store the minimum number of records per node.

See Also:
Constant Field Values

KEY_BINARY_NOMINAL_SPLIT_MODE

public static final String KEY_BINARY_NOMINAL_SPLIT_MODE
Key to store whether to use the binary nominal split mode.

See Also:
Constant Field Values

KEY_NUM_PROCESSORS

public static final String KEY_NUM_PROCESSORS
Key to store the number of processors to use.

See Also:
Constant Field Values

KEY_MAX_NUM_NOMINAL_VALUES

public static final String KEY_MAX_NUM_NOMINAL_VALUES
Key to store the max number of nominal values for which to compute all subsets.

See Also:
Constant Field Values

DATA_INPORT

public static final int DATA_INPORT
Index of input data port.

See Also:
Constant Field Values

MODEL_OUTPORT

public static final int MODEL_OUTPORT
Index of model out port.

See Also:
Constant Field Values

DEFAULT_MIN_NUM_RECORDS_PER_NODE

public static final int DEFAULT_MIN_NUM_RECORDS_PER_NODE
The minimum number records expected per node.

See Also:
Constant Field Values

DEFAULT_SPLIT_AVERAGE

public static final boolean DEFAULT_SPLIT_AVERAGE
The default whether to use the average as the split point is false.

See Also:
Constant Field Values

PRUNING_MDL

public static final String PRUNING_MDL
The constant for mdl pruning.

See Also:
Constant Field Values

PRUNING_ESTIMATED_ERROR

public static final String PRUNING_ESTIMATED_ERROR
The constant for estimated error pruning.

See Also:
Constant Field Values

PRUNING_NO

public static final String PRUNING_NO
The constant for estimated error pruning.

See Also:
Constant Field Values

SPLIT_QUALITY_GINI

public static final String SPLIT_QUALITY_GINI
The constant for the gini index split quality measure.

See Also:
Constant Field Values

SPLIT_QUALITY_GAIN_RATIO

public static final String SPLIT_QUALITY_GAIN_RATIO
The constant for the gain ratio split quality measure.

See Also:
Constant Field Values

DEFAULT_PRUNING_METHOD

public static final String DEFAULT_PRUNING_METHOD
The default pruning method.

See Also:
Constant Field Values

DEFAULT_SPLIT_QUALITY_MEASURE

public static final String DEFAULT_SPLIT_QUALITY_MEASURE
The default split quality measure.

See Also:
Constant Field Values

DEFAULT_PRUNING_CONFIDENCE_THRESHOLD

public static final double DEFAULT_PRUNING_CONFIDENCE_THRESHOLD
The default confidence threshold for pruning.

See Also:
Constant Field Values

DEFAULT_MEMORY_OPTION

public static final boolean DEFAULT_MEMORY_OPTION
The default build option (memory or on disk).

See Also:
Constant Field Values

DEFAULT_NUMBER_RECORDS_FOR_VIEW

public static final int DEFAULT_NUMBER_RECORDS_FOR_VIEW
The default number of records stored for the view.

See Also:
Constant Field Values

DEFAULT_BINARY_NOMINAL_SPLIT_MODE

public static final boolean DEFAULT_BINARY_NOMINAL_SPLIT_MODE
The default binary split mode (off).

See Also:
Constant Field Values

DEFAULT_MAX_BIN_NOMINAL_SPLIT_COMPUTATION

public static final int DEFAULT_MAX_BIN_NOMINAL_SPLIT_COMPUTATION
The default for the maximum number of nominal values for which all subsets are calculated (results in the optimal binary split); this parameter is only use if binaryNominalSplits is true; if the number of nominal values is higher, a heuristic is applied.

See Also:
Constant Field Values

MAX_NUM_PROCESSORS

public static final int MAX_NUM_PROCESSORS
The default number of records stored for the view.


DEFAULT_NUM_PROCESSORS

public static final int DEFAULT_NUM_PROCESSORS
The default number of records stored for the view.

Constructor Detail

DecisionTreeLearnerNodeModel

public DecisionTreeLearnerNodeModel()
Inits a new Decision Tree model with one data in- and one model output port.

Method Detail

execute

protected PortObject[] execute(PortObject[] data,
                               ExecutionContext exec)
                        throws Exception
Start of decision tree induction.

Overrides:
execute in class NodeModel
Parameters:
exec - the execution context for this run
data - the input data to build the decision tree from
Returns:
an empty data table array, as just a model is provided
Throws:
Exception - any type of exception, e.g. for cancellation, invalid input,...
See Also:
NodeModel.execute(BufferedDataTable[],ExecutionContext)

reset

protected void reset()
Resets all internal data.

Specified by:
reset in class NodeModel

configure

protected PortObjectSpec[] configure(PortObjectSpec[] inSpecs)
                              throws InvalidSettingsException
The number of the class column must be > 0 and < number of input columns.

Overrides:
configure in class NodeModel
Parameters:
inSpecs - the tabel specs on the input port to use for configuration
Returns:
the table specs for the output ports
Throws:
InvalidSettingsException - thrown if the configuration is not correct
See Also:
NodeModel.configure(DataTableSpec[])

loadValidatedSettingsFrom

protected void loadValidatedSettingsFrom(NodeSettingsRO settings)
                                  throws InvalidSettingsException
Loads the class column and the classification value in the model.

Specified by:
loadValidatedSettingsFrom in class NodeModel
Parameters:
settings - the settings object to which the settings are stored
Throws:
InvalidSettingsException - if there occur errors during saving the settings
See Also:
NodeModel.loadValidatedSettingsFrom(NodeSettingsRO)

saveSettingsTo

protected void saveSettingsTo(NodeSettingsWO settings)
Saves the class column and the classification value in the settings. Adds to the given NodeSettings the model specific settings. The settings don't need to be complete or consistent. If, right after startup, no valid settings are available this method can write either nothing or invalid settings.

Method is called by the Node if the current settings need to be saved or transfered to the node's dialog.

Specified by:
saveSettingsTo in class NodeModel
Parameters:
settings - The object to write settings into.
See Also:
NodeModel.loadValidatedSettingsFrom(NodeSettingsRO), NodeModel.validateSettings(NodeSettingsRO)

validateSettings

protected void validateSettings(NodeSettingsRO settings)
                         throws InvalidSettingsException
This method validates the settings. That is: Validates the settings in the passed NodeSettings object. The specified settings should be checked for completeness and consistency. It must be possible to load a settings object validated here without any exception in the #loadValidatedSettings(NodeSettings) method. The method must not change the current settings in the model - it is supposed to just check them. If some settings are missing, invalid, inconsistent, or just not right throw an exception with a message useful to the user.

Specified by:
validateSettings in class NodeModel
Parameters:
settings - The settings to validate.
Throws:
InvalidSettingsException - If the validation of the settings failed.
See Also:
NodeModel.validateSettings(NodeSettingsRO)

loadInternals

protected void loadInternals(File nodeInternDir,
                             ExecutionMonitor exec)
                      throws IOException,
                             CanceledExecutionException
Load internals into the derived NodeModel. This method is only called if the Node was executed. Read all your internal structures from the given file directory to create your internal data structure which is necessary to provide all node functionalities after the workflow is loaded, e.g. view content and/or hilite mapping.

Specified by:
loadInternals in class NodeModel
Parameters:
nodeInternDir - The directory to read from.
exec - Used to report progress and to cancel the load process.
Throws:
IOException - If an error occurs during reading from this dir.
CanceledExecutionException - If the loading has been canceled.
See Also:
NodeModel.saveInternals(File,ExecutionMonitor)

saveInternals

protected void saveInternals(File nodeInternDir,
                             ExecutionMonitor exec)
                      throws IOException,
                             CanceledExecutionException
Save internals of the derived NodeModel. This method is only called if the Node is executed. Write all your internal structures into the given file directory which are necessary to recreate this model when the workflow is loaded, e.g. view content and/or hilite mapping.

Specified by:
saveInternals in class NodeModel
Parameters:
nodeInternDir - The directory to write into.
exec - Used to report progress and to cancel the save process.
Throws:
IOException - If an error occurs during writing to this dir.
CanceledExecutionException - If the saving has been canceled.
See Also:
NodeModel.loadInternals(File,ExecutionMonitor)

getDecisionTree

public DecisionTree getDecisionTree()
Returns the decision tree model.

Returns:
the decision tree model

checkMemory

static void checkMemory()
Checks the memory footprint. If too few memory a useful exception is thrown.


criticalMemoryFootprint

public static boolean criticalMemoryFootprint()
Returns whether the memory footprint is critical.

Returns:
whether the memory footprint is critical


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.