org.knime.base.node.mine.bayes.naivebayes.datamodel
Class NaiveBayesModel

java.lang.Object
  extended by org.knime.base.node.mine.bayes.naivebayes.datamodel.NaiveBayesModel

public class NaiveBayesModel
extends Object

This class represents the learned Naive Bayes model. This basic model holds for each attribute an AttributeModel. Which provides the probability information for each class value.

Author:
Tobias Koetter, University of Konstanz

Field Summary
static NumberFormat HTML_VALUE_FORMATER
          The NumberFormater to use in the html views.
 
Constructor Summary
NaiveBayesModel(BufferedDataTable data, String classColName, ExecutionContext exec, int maxNoOfNominalVals, boolean skipMissingVals)
          Constructor which iterates through the DataTable to calculate the needed Bayes variables.
NaiveBayesModel(ConfigRO predParams)
          Constructor for class NaiveBayesModel.
 
Method Summary
 List<String> check4MissingCols(DataTableSpec tableSpec)
          Checks if the model contains attributes which are not present in the given table specification which could influence the prediction result.
 List<String> check4UnknownCols(DataTableSpec tableSpec)
          Checks if the given table specification contains columns which are not covered by the learned model.
 boolean containsSkippedAttributes()
           
 AttributeModel getAttributeModel(String attributeName)
           
 Collection<AttributeModel> getAttributeModels()
           
 List<String> getAttributesWithMissingVals()
           
 DataType getClassColumnDataType()
           
 String getClassColumnName()
           
 double getClassPriorProbability(String classValue)
           
 double[] getClassProbabilities(String[] attributeNames, DataRow row, List<String> classValues, boolean normalize, double laplaceCorrector)
           
 String getHTMLView()
           
 String getMostLikelyClass(String[] attrNames, DataRow row, double laplaceCorrector)
          Returns the name of the class with the highest probability for the given row.
 int getNoOfRecs()
           
 List<AttributeModel> getSkippedAttributes()
           
 String getSkippedAttributesString(int max2Show)
           
 List<String> getSortedClassValues()
           
 String getSummary()
           
 void savePredictorParams(ConfigWO predParams)
           
 String toString()
          
 void updateModel(DataRow row, DataTableSpec tableSpec, int classColIdx)
          Updates the current NaiveBayesModel with the values from the given DataRow.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

HTML_VALUE_FORMATER

public static final NumberFormat HTML_VALUE_FORMATER
The NumberFormater to use in the html views.

Constructor Detail

NaiveBayesModel

public NaiveBayesModel(BufferedDataTable data,
                       String classColName,
                       ExecutionContext exec,
                       int maxNoOfNominalVals,
                       boolean skipMissingVals)
                throws CanceledExecutionException,
                       InvalidSettingsException
Constructor which iterates through the DataTable to calculate the needed Bayes variables.

Parameters:
data - The BufferedDataTable with the data
classColName - The name of the column with the class
exec - the ExecutionContext to provide progress information and check for cancel
maxNoOfNominalVals - the maximum number of supported unique nominal attribute values
skipMissingVals - set to true if the missing values should be skipped during learning and prediction
Throws:
CanceledExecutionException - if the user presses the cancel button during model creation
InvalidSettingsException - if the input data contains no rows

NaiveBayesModel

public NaiveBayesModel(ConfigRO predParams)
                throws InvalidSettingsException
Constructor for class NaiveBayesModel.

Parameters:
predParams - the ModelContentRO to read from
Throws:
InvalidSettingsException - if a mandatory key is not available
Method Detail

updateModel

public void updateModel(DataRow row,
                        DataTableSpec tableSpec,
                        int classColIdx)
                 throws InvalidSettingsException
Updates the current NaiveBayesModel with the values from the given DataRow.

Parameters:
row - DataRow with values for update
tableSpec - underlying DataTableSpec
classColIdx - the index of the class column
Throws:
InvalidSettingsException - if missing values occur in class column or an attribute has too many values.

savePredictorParams

public void savePredictorParams(ConfigWO predParams)
Parameters:
predParams - to save the model

containsSkippedAttributes

public boolean containsSkippedAttributes()
Returns:
true if the model contains skipped attributes

getSkippedAttributes

public List<AttributeModel> getSkippedAttributes()
Returns:
the skipped attributes or an empty list

getSkippedAttributesString

public String getSkippedAttributesString(int max2Show)
Parameters:
max2Show - the maximum number of missing attributes to display
Returns:
a String that shows the skipped attributes

getSortedClassValues

public List<String> getSortedClassValues()
Returns:
all class values in natural order

getClassPriorProbability

public double getClassPriorProbability(String classValue)
Parameters:
classValue - the value of the class we want the probability for
Returns:
the prior probability for the given class

getClassProbabilities

public double[] getClassProbabilities(String[] attributeNames,
                                      DataRow row,
                                      List<String> classValues,
                                      boolean normalize,
                                      double laplaceCorrector)
Parameters:
attributeNames - the name of the attributes we want the normalized probability values for
row - the row with the values in the same order like the attribute names
classValues - the class values to calculate the probability for
normalize - set to true if the probability values should be normalized
laplaceCorrector - the Laplace corrector to use. A value greater 0 tolerates zero counts (i.e. does not produce 0 probabilities)
Returns:
the probability values in the same order like the class values

getNoOfRecs

public int getNoOfRecs()
Returns:
the total number of training records

getClassColumnName

public String getClassColumnName()
Returns:
the name of the column with the class attribute.

getClassColumnDataType

public DataType getClassColumnDataType()
Returns:
the DataType of the column with the class attribute.

getSummary

public String getSummary()
Returns:
the a summary of the model

getHTMLView

public String getHTMLView()
Returns:
a HTML representation of all attribute models

getAttributesWithMissingVals

public List<String> getAttributesWithMissingVals()
Returns:
the name of all attributes which has at least one missing value during learning or an empty list

getAttributeModel

public AttributeModel getAttributeModel(String attributeName)
Parameters:
attributeName - the name of the attribute
Returns:
the model for the given attribute name or null if the attribute is not known

getAttributeModels

public Collection<AttributeModel> getAttributeModels()
Returns:
an unmodifiable Collection with all AttributeModel objects

getMostLikelyClass

public String getMostLikelyClass(String[] attrNames,
                                 DataRow row,
                                 double laplaceCorrector)
Returns the name of the class with the highest probability for the given row.

Parameters:
attrNames - the attribute names in the same order they appear in the given data row
row - the row with the attributes in the same order like in the training data table
laplaceCorrector - the Laplace corrector to use. A value greater 0 overcomes zero counts
Returns:
the class attribute with the highest probability for the given attribute values.

toString

public String toString()

Overrides:
toString in class Object

check4UnknownCols

public List<String> check4UnknownCols(DataTableSpec tableSpec)
Checks if the given table specification contains columns which are not covered by the learned model. Either because the name is not known or the type is wrong.

Parameters:
tableSpec - the DataTableSpec to check for unknown columns
Returns:
the name of the unknown columns or an empty List

check4MissingCols

public List<String> check4MissingCols(DataTableSpec tableSpec)
Checks if the model contains attributes which are not present in the given table specification which could influence the prediction result.

Parameters:
tableSpec - the DataTableSpec to check for missing columns
Returns:
the name of the missing columns or an empty List


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.