org.knime.base.node.viz.plotter.box
Class BoxPlotNodeModel

java.lang.Object
  extended by org.knime.core.node.NodeModel
      extended by org.knime.base.node.viz.plotter.box.BoxPlotNodeModel
All Implemented Interfaces:
BoxPlotDataProvider, DataProvider

public class BoxPlotNodeModel
extends NodeModel
implements BoxPlotDataProvider

The input data is sorted for each numeric column and the necessary parameters are determined: minimum, lower whisker (in case of outliers it is the first non-outlier), lower quartile, median, upper quartile, upper whisker and maximum. Each column is then associated with a double array of these parameters, which are passed to the BoxPlotter. To do so, the BoxPlotNodeModel implements a new interface, the BoxPlotDataProvider, which passes the statistical parameters and the mild and extreme outliers.

Author:
Fabian Dill, University of Konstanz

Field Summary
static int LOWER_QUARTILE
          Constant for the lower quartile position in the statistics array.
static int LOWER_WHISKER
          Constant for the lower whisker position in the statistics array.
static int MAX
          Constant for the maximum position in the statistics array.
static int MEDIAN
          Constant for the median position in the statistics array.
static int MIN
          Constant for the minimum position in the statistics array.
static int SIZE
          Constant for the size of the statistics array.
static int UPPER_QUARTILE
          Constant for the upper quartile position in the statistics array.
static int UPPER_WHISKER
          Constant for the upper whisker position in the statistics array.
 
Fields inherited from interface org.knime.base.node.viz.plotter.DataProvider
END, START
 
Constructor Summary
BoxPlotNodeModel()
          One input for the data one output for the parameters (median, quartiles and inter-quartile range(IQR).
 
Method Summary
protected  DataTableSpec[] configure(DataTableSpec[] inSpecs)
          This function is called whenever the derived model should re-configure its output DataTableSpecs.
 void detectOutliers(DataTable table, double iqr, double[] q, Map<Double,Set<RowKey>> mild, Map<Double,Set<RowKey>> extreme, double[] whiskers, int colIdx)
          Detects mild (= < 3 * IQR) and extreme (= > 3 * IQR) outliers.
protected  BufferedDataTable[] execute(BufferedDataTable[] inData, ExecutionContext exec)
          This function is invoked by the Node#executeNode() method of the node (through the #executeModel(BufferedDataTable[],ExecutionMonitor) method)only after all predecessor nodes have been successfully executed and all data is therefore available at the input ports.
 DataArray getDataArray(int index)
          Provides the data that should be visualized.
 Map<String,Map<Double,Set<RowKey>>> getExtremeOutliers()
          Extreme outliers are values < q1 - 3 * iqr and > q3 + 3 * iqr.
 Map<String,Map<Double,Set<RowKey>>> getMildOutliers()
          Mild outliers are values > q1 - 3 * iqr and < q1 - 1.5 * iqr and < q3 + 3 * iqr and > q3 + 1.5 * iqr.
protected  HiLiteHandler getOutHiLiteHandler(int outIndex)
          Returns the HiLiteHandler for the given output index.
 Map<DataColumnSpec,double[]> getStatistics()
          
protected  void loadInternals(File nodeInternDir, ExecutionMonitor exec)
          Load internals into the derived NodeModel.
protected  void loadValidatedSettingsFrom(NodeSettingsRO settings)
          Sets new settings from the passed object in the model.
protected  void reset()
          Override this function in the derived model and reset your NodeModel.
protected  void saveInternals(File nodeInternDir, ExecutionMonitor exec)
          Save internals of the derived NodeModel.
protected  void saveSettingsTo(NodeSettingsWO settings)
          Adds to the given NodeSettings the model specific settings.
protected  void validateSettings(NodeSettingsRO settings)
          Validates the settings in the passed NodeSettings object.
 
Methods inherited from class org.knime.core.node.NodeModel
addWarningListener, configure, continueLoop, execute, executeModel, getInHiLiteHandler, getLoopEndNode, getLoopStartNode, getNrInPorts, getNrOutPorts, getWarningMessage, notifyViews, notifyWarningListeners, peekFlowVariableDouble, peekFlowVariableInt, peekFlowVariableString, peekScopeVariableDouble, peekScopeVariableInt, peekScopeVariableString, pushFlowVariableDouble, pushFlowVariableInt, pushFlowVariableString, pushScopeVariableDouble, pushScopeVariableInt, pushScopeVariableString, removeWarningListener, setInHiLiteHandler, setWarningMessage, stateChanged
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MIN

public static final int MIN
Constant for the minimum position in the statistics array.

See Also:
Constant Field Values

LOWER_WHISKER

public static final int LOWER_WHISKER
Constant for the lower whisker position in the statistics array.

See Also:
Constant Field Values

LOWER_QUARTILE

public static final int LOWER_QUARTILE
Constant for the lower quartile position in the statistics array.

See Also:
Constant Field Values

MEDIAN

public static final int MEDIAN
Constant for the median position in the statistics array.

See Also:
Constant Field Values

UPPER_QUARTILE

public static final int UPPER_QUARTILE
Constant for the upper quartile position in the statistics array.

See Also:
Constant Field Values

UPPER_WHISKER

public static final int UPPER_WHISKER
Constant for the upper whisker position in the statistics array.

See Also:
Constant Field Values

MAX

public static final int MAX
Constant for the maximum position in the statistics array.

See Also:
Constant Field Values

SIZE

public static final int SIZE
Constant for the size of the statistics array.

See Also:
Constant Field Values
Constructor Detail

BoxPlotNodeModel

public BoxPlotNodeModel()
One input for the data one output for the parameters (median, quartiles and inter-quartile range(IQR).

Method Detail

configure

protected DataTableSpec[] configure(DataTableSpec[] inSpecs)
                             throws InvalidSettingsException
This function is called whenever the derived model should re-configure its output DataTableSpecs. Based on the given input data table spec(s) and the current model's settings, the derived model has to calculate the output data table spec and return them.

The passed DataTableSpec elements are never null but can be empty. The model may return null data table spec(s) for the outputs. But still, the model may be in an executable state. Note, after the model has been executed this function will not be called anymore, as the output DataTableSpecs are then being pulled from the output DataTables. A derived NodeModel that cannot provide any DataTableSpecs at its outputs before execution (because the table structure is unknown at this point) can return an array containing just null elements.

Implementation note: This method is called from the NodeModel.configure(PortObjectSpec[]) method unless that method is overwritten.

Overrides:
configure in class NodeModel
Parameters:
inSpecs - An array of DataTableSpecs (as many as this model has inputs). Do NOT modify the contents of this array. None of the DataTableSpecs in the array can be null but empty. If the predecessor node is not yet connected, or doesn't provide a DataTableSpecs at its output port.
Returns:
An array of DataTableSpecs (as many as this model has outputs) They will be propagated to connected successor nodes. null DataTableSpec elements are changed to empty once.
Throws:
InvalidSettingsException - if the #configure() failed, that is, the settings are inconsistent with given DataTableSpec elements.

execute

protected BufferedDataTable[] execute(BufferedDataTable[] inData,
                                      ExecutionContext exec)
                               throws Exception
This function is invoked by the Node#executeNode() method of the node (through the #executeModel(BufferedDataTable[],ExecutionMonitor) method)only after all predecessor nodes have been successfully executed and all data is therefore available at the input ports. Implement this function with your task in the derived model.

The input data is available in the given array argument inData and is ensured to be neither null nor contain null elements.

In order to create output data, you need to create objects of class BufferedDataTable. Use the execution context argument to create BufferedDataTable.

Overrides:
execute in class NodeModel
Parameters:
inData - An array holding DataTable elements, one for each input.
exec - The execution monitor for this execute method. It provides us with means to create new BufferedDataTable. Additionally, it should be asked frequently if the execution should be interrupted and throws an exception then. This exception might me caught, and then after closing all data streams, been thrown again. Also, if you can tell the progress of your task, just set it in this monitor.
Returns:
An array of non- null DataTable elements with the size of the number of outputs. The result of this execution.
Throws:
Exception - If you must fail the execution. Try to provide a meaningful error message in the exception as it will be displayed to the user.Please be advised to check frequently the canceled status by invoking ExecutionMonitor#checkCanceled which will throw an CanceledExcecutionException and abort the execution.

detectOutliers

public void detectOutliers(DataTable table,
                           double iqr,
                           double[] q,
                           Map<Double,Set<RowKey>> mild,
                           Map<Double,Set<RowKey>> extreme,
                           double[] whiskers,
                           int colIdx)
Detects mild (= < 3 * IQR) and extreme (= > 3 * IQR) outliers.

Parameters:
table - the sorted! table containing the values.
iqr - the interquartile range
mild - list to store mild outliers
extreme - list to store extreme outliers
colIdx - the index for the column of interest
q - quartiles the lower quartile at 0,upper quartile at 1.
whiskers - array to store the lower and upper whisker bar

getStatistics

public Map<DataColumnSpec,double[]> getStatistics()

Specified by:
getStatistics in interface BoxPlotDataProvider
Returns:
a map of the column name and a double array containing the minimum, the lower quartile, the median, the upper quatile and the maximum value for that column.

getMildOutliers

public Map<String,Map<Double,Set<RowKey>>> getMildOutliers()
Mild outliers are values > q1 - 3 * iqr and < q1 - 1.5 * iqr and < q3 + 3 * iqr and > q3 + 1.5 * iqr.

Specified by:
getMildOutliers in interface BoxPlotDataProvider
Returns:
a list of mild outliers for each column.

getExtremeOutliers

public Map<String,Map<Double,Set<RowKey>>> getExtremeOutliers()
Extreme outliers are values < q1 - 3 * iqr and > q3 + 3 * iqr.

Specified by:
getExtremeOutliers in interface BoxPlotDataProvider
Returns:
a list of extreme outliers for each column.

getDataArray

public DataArray getDataArray(int index)
Provides the data that should be visualized. The index can be used, if a NodeModel has two inports and both data should be visualized. Then the index provides means to determine which DataArray should be returned.

Specified by:
getDataArray in interface DataProvider
Parameters:
index - if the data of more than one data table should be visualized.
Returns:
the data as a data array.

getOutHiLiteHandler

protected HiLiteHandler getOutHiLiteHandler(int outIndex)
Returns the HiLiteHandler for the given output index. This default implementation simply passes on the handler of input port 0 or generates a new one if this node has no inputs.

This method is intended to be overridden

Overrides:
getOutHiLiteHandler in class NodeModel
Parameters:
outIndex - The output index.
Returns:
HiLiteHandler for the given output port.

loadValidatedSettingsFrom

protected void loadValidatedSettingsFrom(NodeSettingsRO settings)
                                  throws InvalidSettingsException
Sets new settings from the passed object in the model. You can safely assume that the object passed has been successfully validated by the #validateSettings(NodeSettings) method. The model must set its internal configuration according to the settings object passed.

Specified by:
loadValidatedSettingsFrom in class NodeModel
Parameters:
settings - The settings to read.
Throws:
InvalidSettingsException - If a property is not available.
See Also:
NodeModel.saveSettingsTo(NodeSettingsWO), NodeModel.validateSettings(NodeSettingsRO)

reset

protected void reset()
Override this function in the derived model and reset your NodeModel. All components should unregister themselves from any observables (at least from the hilite handler right now). All internally stored data structures should be released. User settings should not be deleted/reset though.

Specified by:
reset in class NodeModel

loadInternals

protected void loadInternals(File nodeInternDir,
                             ExecutionMonitor exec)
                      throws IOException,
                             CanceledExecutionException
Load internals into the derived NodeModel. This method is only called if the Node was executed. Read all your internal structures from the given file directory to create your internal data structure which is necessary to provide all node functionalities after the workflow is loaded, e.g. view content and/or hilite mapping.

Specified by:
loadInternals in class NodeModel
Parameters:
nodeInternDir - The directory to read from.
exec - Used to report progress and to cancel the load process.
Throws:
IOException - If an error occurs during reading from this dir.
CanceledExecutionException - If the loading has been canceled.
See Also:
NodeModel.saveInternals(File,ExecutionMonitor)

saveInternals

protected void saveInternals(File nodeInternDir,
                             ExecutionMonitor exec)
                      throws IOException,
                             CanceledExecutionException
Save internals of the derived NodeModel. This method is only called if the Node is executed. Write all your internal structures into the given file directory which are necessary to recreate this model when the workflow is loaded, e.g. view content and/or hilite mapping.

Specified by:
saveInternals in class NodeModel
Parameters:
nodeInternDir - The directory to write into.
exec - Used to report progress and to cancel the save process.
Throws:
IOException - If an error occurs during writing to this dir.
CanceledExecutionException - If the saving has been canceled.
See Also:
NodeModel.loadInternals(File,ExecutionMonitor)

saveSettingsTo

protected void saveSettingsTo(NodeSettingsWO settings)
Adds to the given NodeSettings the model specific settings. The settings don't need to be complete or consistent. If, right after startup, no valid settings are available this method can write either nothing or invalid settings.

Method is called by the Node if the current settings need to be saved or transfered to the node's dialog.

Specified by:
saveSettingsTo in class NodeModel
Parameters:
settings - The object to write settings into.
See Also:
NodeModel.loadValidatedSettingsFrom(NodeSettingsRO), NodeModel.validateSettings(NodeSettingsRO)

validateSettings

protected void validateSettings(NodeSettingsRO settings)
                         throws InvalidSettingsException
Validates the settings in the passed NodeSettings object. The specified settings should be checked for completeness and consistency. It must be possible to load a settings object validated here without any exception in the #loadValidatedSettings(NodeSettings) method. The method must not change the current settings in the model - it is supposed to just check them. If some settings are missing, invalid, inconsistent, or just not right throw an exception with a message useful to the user.

Specified by:
validateSettings in class NodeModel
Parameters:
settings - The settings to validate.
Throws:
InvalidSettingsException - If the validation of the settings failed.
See Also:
NodeModel.saveSettingsTo(NodeSettingsWO), NodeModel.loadValidatedSettingsFrom(NodeSettingsRO)


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.