org.knime.base.node.mine.pca
Class PCANodeModel

java.lang.Object
  extended by org.knime.core.node.NodeModel
      extended by org.knime.base.node.mine.pca.PCANodeModel

public class PCANodeModel
extends NodeModel

The model class that implements the PCA on the input table.

Author:
Uwe Nagel, University of Konstanz

Field Summary
static int DATA_INPORT
          Index of input data port.
static int DATA_OUTPORT
          Index of input data port.
static String DIMENSIONS_SELECTION
          config String for selecting whether the number of dimensions or the minimum quality is configured.
(package private) static String FAIL_MISSING
          String used for fail on missing config.
static int INFO_OUTPORT
          Index of decomposition output port.
(package private) static String INPUT_COLUMNS
           
static int MATRIX_OUTPORT
          Index of covariance matrix output port.
(package private) static String PCA_COL_PREFIX
          description String for dimension.
(package private) static String REMOVE_COLUMNS
          config String for remove columns.
 
Constructor Summary
PCANodeModel()
          One input, one output table.
 
Method Summary
protected  DataTableSpec[] configure(DataTableSpec[] inSpecs)
          All IntCell columns are converted to DoubleCell columns.
protected static DataCell[] convertInputRow(Jama.Matrix eigenvectors, DataRow row, double[] means, int[] inputColumnIndices, int resultDimensions, boolean failOnMissing)
          reduce a single input row to the principal components.
static DataColumnSpec[] createAddTableSpec(DataTableSpec inSpecs, int resultDimensions)
          create part of table spec to be added to the input table.
static DataTableSpec createCovarianceMatrixSpec(String[] inputColumnNames)
           
static BufferedDataTable createCovarianceTable(ExecutionContext exec, double[][] m, String[] inputColumnNames)
          create data table from covariance matrix.
static BufferedDataTable createDecompositionOutputTable(ExecutionContext exec, double[] evs, Jama.Matrix eigenvectors)
          create a table containing the given spectral decomposition.
static DataTableSpec createDecompositionTableSpec(int dimensions)
          create table spec for output of spectral decomposition.
protected  PortObject[] execute(PortObject[] inData, ExecutionContext exec)
          Performs the PCA.
(package private) static int getCovarianceMatrix(ExecutionContext exec, BufferedDataTable dataTable, int[] numericIndices, double[] means, double[][] dataMatrix)
          Converts a DataTable to the 2D-double array representing its covariance matrix.
(package private) static int[] getDefaultColumns(DataTableSpec dataTableSpec)
          get column indices for all double compatible columns.
(package private) static double[] getMeanVector(DataTable dataTable, int[] numericIndices, boolean failOnMissingValues, ExecutionContext exec)
          calculate means of all columns.
protected  void loadInternals(File nodeInternDir, ExecutionMonitor exec)
          Load internals into the derived NodeModel.
protected  void loadValidatedSettingsFrom(NodeSettingsRO settings)
          Sets new settings from the passed object in the model.
protected  void reset()
          Override this function in the derived model and reset your NodeModel.
protected  void saveInternals(File nodeInternDir, ExecutionMonitor exec)
          Save internals of the derived NodeModel.
protected  void saveSettingsTo(NodeSettingsWO settings)
          Adds to the given NodeSettings the model specific settings.
protected  void validateSettings(NodeSettingsRO settings)
          Validates the settings in the passed NodeSettings object.
 
Methods inherited from class org.knime.core.node.NodeModel
addWarningListener, configure, continueLoop, execute, executeModel, getInHiLiteHandler, getLoopEndNode, getLoopStartNode, getNrInPorts, getNrOutPorts, getOutHiLiteHandler, getWarningMessage, notifyViews, notifyWarningListeners, peekFlowVariableDouble, peekFlowVariableInt, peekFlowVariableString, peekScopeVariableDouble, peekScopeVariableInt, peekScopeVariableString, pushFlowVariableDouble, pushFlowVariableInt, pushFlowVariableString, pushScopeVariableDouble, pushScopeVariableInt, pushScopeVariableString, removeWarningListener, setInHiLiteHandler, setWarningMessage, stateChanged
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

FAIL_MISSING

static final String FAIL_MISSING
String used for fail on missing config.

See Also:
Constant Field Values

INPUT_COLUMNS

static final String INPUT_COLUMNS
See Also:
Constant Field Values

DATA_INPORT

public static final int DATA_INPORT
Index of input data port.

See Also:
Constant Field Values

DATA_OUTPORT

public static final int DATA_OUTPORT
Index of input data port.

See Also:
Constant Field Values

INFO_OUTPORT

public static final int INFO_OUTPORT
Index of decomposition output port.

See Also:
Constant Field Values

MATRIX_OUTPORT

public static final int MATRIX_OUTPORT
Index of covariance matrix output port.

See Also:
Constant Field Values

PCA_COL_PREFIX

static final String PCA_COL_PREFIX
description String for dimension.

See Also:
Constant Field Values

REMOVE_COLUMNS

static final String REMOVE_COLUMNS
config String for remove columns.

See Also:
Constant Field Values

DIMENSIONS_SELECTION

public static final String DIMENSIONS_SELECTION
config String for selecting whether the number of dimensions or the minimum quality is configured.

See Also:
Constant Field Values
Constructor Detail

PCANodeModel

PCANodeModel()
One input, one output table.

Method Detail

configure

protected DataTableSpec[] configure(DataTableSpec[] inSpecs)
                             throws InvalidSettingsException
All IntCell columns are converted to DoubleCell columns. This function is called whenever the derived model should re-configure its output DataTableSpecs. Based on the given input data table spec(s) and the current model's settings, the derived model has to calculate the output data table spec and return them.

The passed DataTableSpec elements are never null but can be empty. The model may return null data table spec(s) for the outputs. But still, the model may be in an executable state. Note, after the model has been executed this function will not be called anymore, as the output DataTableSpecs are then being pulled from the output DataTables. A derived NodeModel that cannot provide any DataTableSpecs at its outputs before execution (because the table structure is unknown at this point) can return an array containing just null elements.

Implementation note: This method is called from the NodeModel.configure(PortObjectSpec[]) method unless that method is overwritten.

Overrides:
configure in class NodeModel
Parameters:
inSpecs - An array of DataTableSpecs (as many as this model has inputs). Do NOT modify the contents of this array. None of the DataTableSpecs in the array can be null but empty. If the predecessor node is not yet connected, or doesn't provide a DataTableSpecs at its output port.
Returns:
An array of DataTableSpecs (as many as this model has outputs) They will be propagated to connected successor nodes. null DataTableSpec elements are changed to empty once.
Throws:
InvalidSettingsException - if the #configure() failed, that is, the settings are inconsistent with given DataTableSpec elements.

createDecompositionTableSpec

public static DataTableSpec createDecompositionTableSpec(int dimensions)
create table spec for output of spectral decomposition.

Parameters:
dimensions - number of dimension of input
Returns:
table spec (first col for eigenvalues, others for components of eigenvectors)

getDefaultColumns

static int[] getDefaultColumns(DataTableSpec dataTableSpec)
get column indices for all double compatible columns.

Parameters:
dataTableSpec - table spec
Returns:
array of indices

createAddTableSpec

public static DataColumnSpec[] createAddTableSpec(DataTableSpec inSpecs,
                                                  int resultDimensions)
create part of table spec to be added to the input table.

Parameters:
inSpecs - input specs (for unique column names)
resultDimensions - number of dimensions in output
Returns:
part of table spec to be added to input table

execute

protected PortObject[] execute(PortObject[] inData,
                               ExecutionContext exec)
                        throws Exception
Performs the PCA. Execute method for general port types. The argument objects represent the input objects and are guaranteed to be subclasses of the PortObject classes that are defined through the PortTypes given in the constructor. Similarly, the returned output objects need to comply with their port types object class (otherwise an error is reported by the framework).

For a general description of the execute method refer to the description of the specialized NodeModel.execute(BufferedDataTable[], ExecutionContext) methods as it addresses more use cases.

Overrides:
execute in class NodeModel
Parameters:
inData - The input objects.
exec - For BufferedDataTable creation and progress.
Returns:
The output objects.
Throws:
Exception - If the node execution fails for any reason.

createDecompositionOutputTable

public static BufferedDataTable createDecompositionOutputTable(ExecutionContext exec,
                                                               double[] evs,
                                                               Jama.Matrix eigenvectors)
                                                        throws CanceledExecutionException
create a table containing the given spectral decomposition.

Parameters:
exec - execution context for table creation
evs - eigenvalues
eigenvectors - (column contains an eigenvector)
Returns:
the created table
Throws:
CanceledExecutionException

convertInputRow

protected static DataCell[] convertInputRow(Jama.Matrix eigenvectors,
                                            DataRow row,
                                            double[] means,
                                            int[] inputColumnIndices,
                                            int resultDimensions,
                                            boolean failOnMissing)
reduce a single input row to the principal components.

Parameters:
eigenvectors - transposed matrix of eigenvectors (eigenvectors in rows, number of eigenvectors corresponds to dimensions to be projected to)
row - the row to convert
means - mean values of the columns
inputColumnIndices - indices of the input columns
resultDimensions - number of dimensions to project to
failOnMissing - throw exception if missing values are encountered
Returns:
array of data cells to be added to the row

getCovarianceMatrix

static int getCovarianceMatrix(ExecutionContext exec,
                               BufferedDataTable dataTable,
                               int[] numericIndices,
                               double[] means,
                               double[][] dataMatrix)
                        throws CanceledExecutionException
Converts a DataTable to the 2D-double array representing its covariance matrix. Only numeric attributes are included.

Parameters:
exec - the execution context for progress report (a subcontext)
dataTable - the DataTable to convert
numericIndices - indices of input columns
means - mean values of columns
dataMatrix - matrix to write covariances to
Returns:
number of ignored rows (containing missing values)
Throws:
CanceledExecutionException - if execution is canceled

getMeanVector

static double[] getMeanVector(DataTable dataTable,
                              int[] numericIndices,
                              boolean failOnMissingValues,
                              ExecutionContext exec)
                       throws CanceledExecutionException
calculate means of all columns.

Parameters:
dataTable - input table
numericIndices - indices of columns to use
failOnMissingValues - if true, throw exception if missing values are encountered
exec - execution context
Returns:
vector of column mean values
Throws:
CanceledExecutionException

loadInternals

protected void loadInternals(File nodeInternDir,
                             ExecutionMonitor exec)
                      throws IOException,
                             CanceledExecutionException
Load internals into the derived NodeModel. This method is only called if the Node was executed. Read all your internal structures from the given file directory to create your internal data structure which is necessary to provide all node functionalities after the workflow is loaded, e.g. view content and/or hilite mapping.

Specified by:
loadInternals in class NodeModel
Parameters:
nodeInternDir - The directory to read from.
exec - Used to report progress and to cancel the load process.
Throws:
IOException - If an error occurs during reading from this dir.
CanceledExecutionException - If the loading has been canceled.
See Also:
NodeModel.saveInternals(File,ExecutionMonitor)

saveInternals

protected void saveInternals(File nodeInternDir,
                             ExecutionMonitor exec)
                      throws IOException,
                             CanceledExecutionException
Save internals of the derived NodeModel. This method is only called if the Node is executed. Write all your internal structures into the given file directory which are necessary to recreate this model when the workflow is loaded, e.g. view content and/or hilite mapping.

Specified by:
saveInternals in class NodeModel
Parameters:
nodeInternDir - The directory to write into.
exec - Used to report progress and to cancel the save process.
Throws:
IOException - If an error occurs during writing to this dir.
CanceledExecutionException - If the saving has been canceled.
See Also:
NodeModel.loadInternals(File,ExecutionMonitor)

loadValidatedSettingsFrom

protected void loadValidatedSettingsFrom(NodeSettingsRO settings)
                                  throws InvalidSettingsException
Sets new settings from the passed object in the model. You can safely assume that the object passed has been successfully validated by the #validateSettings(NodeSettings) method. The model must set its internal configuration according to the settings object passed.

Specified by:
loadValidatedSettingsFrom in class NodeModel
Parameters:
settings - The settings to read.
Throws:
InvalidSettingsException - If a property is not available.
See Also:
NodeModel.saveSettingsTo(NodeSettingsWO), NodeModel.validateSettings(NodeSettingsRO)

reset

protected void reset()
Override this function in the derived model and reset your NodeModel. All components should unregister themselves from any observables (at least from the hilite handler right now). All internally stored data structures should be released. User settings should not be deleted/reset though.

Specified by:
reset in class NodeModel

saveSettingsTo

protected void saveSettingsTo(NodeSettingsWO settings)
Adds to the given NodeSettings the model specific settings. The settings don't need to be complete or consistent. If, right after startup, no valid settings are available this method can write either nothing or invalid settings.

Method is called by the Node if the current settings need to be saved or transfered to the node's dialog.

Specified by:
saveSettingsTo in class NodeModel
Parameters:
settings - The object to write settings into.
See Also:
NodeModel.loadValidatedSettingsFrom(NodeSettingsRO), NodeModel.validateSettings(NodeSettingsRO)

validateSettings

protected void validateSettings(NodeSettingsRO settings)
                         throws InvalidSettingsException
Validates the settings in the passed NodeSettings object. The specified settings should be checked for completeness and consistency. It must be possible to load a settings object validated here without any exception in the #loadValidatedSettings(NodeSettings) method. The method must not change the current settings in the model - it is supposed to just check them. If some settings are missing, invalid, inconsistent, or just not right throw an exception with a message useful to the user.

Specified by:
validateSettings in class NodeModel
Parameters:
settings - The settings to validate.
Throws:
InvalidSettingsException - If the validation of the settings failed.
See Also:
NodeModel.saveSettingsTo(NodeSettingsWO), NodeModel.loadValidatedSettingsFrom(NodeSettingsRO)

createCovarianceTable

public static BufferedDataTable createCovarianceTable(ExecutionContext exec,
                                                      double[][] m,
                                                      String[] inputColumnNames)
create data table from covariance matrix.

Parameters:
exec - execution context
m - covariance matrix
inputColumnNames - names of input columns the matrix was created from
Returns:
table

createCovarianceMatrixSpec

public static DataTableSpec createCovarianceMatrixSpec(String[] inputColumnNames)


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.