org.knime.base.node.io.filereader
Class FileReaderSettings

java.lang.Object
  extended by org.knime.core.util.tokenizer.TokenizerSettings
      extended by org.knime.base.node.io.filereader.FileReaderSettings
Direct Known Subclasses:
FileReaderNodeSettings

public class FileReaderSettings
extends TokenizerSettings

Contains all settings needed to read in a ASCII data file. This includes the location of the data file, the settings for the tokenizer (like column delimiter, comment patterns etc.) as well as the row headers and more. This object combined with a DataTableSpec can be used to create a FileTable from. A FileTable will represent then the data of the file in a DataTable.

Author:
ohl, University of Konstanz

Field Summary
static String CFGKEY_DATAURL
          Key used to store data file location in a config object.
static String DEF_ROWPREFIX
          This will be used if the file has not row headers and no row prefix is set.
 
Constructor Summary
FileReaderSettings()
          Creates a new object holding all settings needed to read the specified file.
FileReaderSettings(FileReaderSettings clonee)
          Creates a new object holding the same settings values as the one passed in.
FileReaderSettings(NodeSettingsRO cfg)
          Creates a new FileReaderSettings object initializing its settings from the passed config object.
 
Method Summary
 void addRowDelimiter(String rowDelimPattern, boolean skipEmptyRows)
          Will add a delimiter pattern that will terminate a row.
protected  void addStatusOfSettings(SettingsStatus status, boolean openDataFile, DataTableSpec tableSpec)
          Adds its status messages to a passed status object.
 boolean combinesMultipleRowDelimiters(String pattern)
          Returns true if the file reader combines multiple consecutive row delimiters with this pattern (i.e.
 BufferedFileReader createNewInputReader()
           
 String getCharsetName()
           
 int getColumnNumDeterminingLineNumber()
           
 URL getDataFileLocation()
           
 char getDecimalSeparator()
           
 boolean getFileHasColumnHeaders()
           
 boolean getFileHasRowHeaders()
           
 boolean getIgnoreEmtpyLines()
           
 long getMaximumNumberOfRowsToRead()
           
 String getMissingValueOfColumn(int colIdx)
          Returns the pattern that, if read in for the specified column, will be considered placeholder for a missing value, and the data table will contain a missing cell instead of that value then.
 String getMissValuePatternStrCols()
          Returns the pattern that, if read in, will be translated into a missing value (in string columns only).
 String getRowHeaderPrefix()
           
 SettingsStatus getStatusOfSettings()
          Method to check consistency and completeness of the current settings.
 SettingsStatus getStatusOfSettings(boolean openDataFile, DataTableSpec tableSpec)
          Method to check consistency and completeness of the current settings.
 boolean getSupportShortLines()
           
 String getTableName()
           
 char getThousandsSeparator()
           
 boolean ignoreEmptyTokensAtEndOfRow()
           
 boolean isRowDelimiter(String pattern)
           
 void removeAllDelimiters()
          Removes all (!) delimiters from the file reader settings.
 void removeAllRowDelimiters()
          Blows away all defined row delimiters! After a call to this function no row delimiter will be defined (except null).
 Delimiter removeDelimiterPattern(String pattern)
          Removes the Delimiter object with the specified pattern from the list of defined delimiters.
 Delimiter removeRowDelimiter(String pattern)
          Removes the row delimiter with the specified pattern.
 void saveToConfiguration(NodeSettingsWO cfg)
          Saves all settings into a NodeSettingsWO object.
 void setCharsetName(String name)
          Set the new character set name that will be used the next time a new input reader is created (see createNewInputReader()).
 void setColumnNumDeterminingLineNumber(int lineNumber)
          Sets the line number in the file that determined the number of columns.
 void setDataFileLocationAndUpdateTableName(URL dataFileLocation)
          Sets the location of the file to read data from.
 void setDecimalSeparator(char sep)
          Sets the character that will be considered decimal separator in the data (token) read for double type columns.
 void setFileHasColumnHeaders(boolean flag)
          Tells whether the first line in the file should be considered column headers, or not.
 void setFileHasRowHeaders(boolean flag)
          Tells whether the first token in each line in the file should be considered row header, or not.
 void setIgnoreEmptyLines(boolean ignoreEm)
           
 void setIgnoreEmptyTokensAtEndOfRow(boolean ignoreThem)
          Sets this flag.
 void setMaximumNumberOfRowsToRead(long maxNum)
          Sets a new maximum for the number of rows to read.
 void setMissingValueForColumn(int colIdx, String pattern)
          Specifies a pattern that, if read in for the specified column, will be considered placeholder for a missing value, and the data table will contain a missing cell instead of that value then.
 void setMissValuePatternStrCols(String pattern)
          Sets a new pattern which is translated into a missing value if read from the data file in a string column.
 void setRowHeaderPrefix(String rowPrefix)
          Set a string that will be used as a prefix for each row header.
 void setSupportShortLines(boolean supportShortLines)
           
 void setTableName(String newName)
          Sets a new name for the table created by this node.
 void setThousandsSeparator(char thousandsSeparator)
           
 void setUniquifyRowIDs(boolean uniquify)
           
 String toString()
          
 boolean uniquifyRowIDs()
           
 
Methods inherited from class org.knime.core.util.tokenizer.TokenizerSettings
addBlockCommentPattern, addDelimiterPattern, addDelimiterPattern, addOrReplaceDelimiterPattern, addQuotePattern, addQuotePattern, addQuotePattern, addQuotePattern, addSingleLineCommentPattern, addStatusOfSettings, addWhiteSpaceCharacter, addWhiteSpaceCharacter, getAllComments, getAllDelimiters, getAllQuotes, getAllWhiteSpaces, getCombineMultipleDelimiters, getDelimiterPattern, getLineContinuationCharacter, printableStr, removeAllComments, removeAllQuotes, removeAllWhiteSpaces, removeQuotePattern, setCombineMultipleDelimiters, setLineContinuationCharacter, unescapeString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEF_ROWPREFIX

public static final String DEF_ROWPREFIX
This will be used if the file has not row headers and no row prefix is set.

See Also:
Constant Field Values

CFGKEY_DATAURL

public static final String CFGKEY_DATAURL
Key used to store data file location in a config object.

See Also:
Constant Field Values
Constructor Detail

FileReaderSettings

public FileReaderSettings()
Creates a new object holding all settings needed to read the specified file. The file must be an ASCII representation of the data to read. We are not specifying any default behavior of that newly created object, you really need to set all parameters before reading the file with these settings.


FileReaderSettings

public FileReaderSettings(FileReaderSettings clonee)
Creates a new object holding the same settings values as the one passed in.

Parameters:
clonee - the object to read the settings values from

FileReaderSettings

public FileReaderSettings(NodeSettingsRO cfg)
                   throws InvalidSettingsException
Creates a new FileReaderSettings object initializing its settings from the passed config object.

Parameters:
cfg - the config object containing all settings this object will be initialized with
Throws:
InvalidSettingsException - if the passed config object contains invalid or insufficient settings
Method Detail

saveToConfiguration

public void saveToConfiguration(NodeSettingsWO cfg)
Saves all settings into a NodeSettingsWO object. Using the cfg object to construct a new FileReaderSettings object should lead to an object identical to this.

Overrides:
saveToConfiguration in class TokenizerSettings
Parameters:
cfg - the config object the settings are stored into

setDataFileLocationAndUpdateTableName

public void setDataFileLocationAndUpdateTableName(URL dataFileLocation)
Sets the location of the file to read data from. Won't check correctness.

Parameters:
dataFileLocation - the URL of the data file these settings are for

getDataFileLocation

public URL getDataFileLocation()
Returns:
the location of the file these settings are meant for

setCharsetName

public void setCharsetName(String name)
Set the new character set name that will be used the next time a new input reader is created (see createNewInputReader()).

Parameters:
name - any character set supported by Java, or null to use the VM default char set.
Throws:
IllegalArgumentException - if the specified name is not supported.
IllegalCharsetNameException - if the specified name is not supported.

getCharsetName

public String getCharsetName()
Returns:
the charset name set, or null if the VM's default is used

createNewInputReader

public BufferedFileReader createNewInputReader()
                                        throws IOException
Returns:
a new reader to read from the data file location. It will create a buffered reader, and for zipped sources a GZIP one. If the data location is not set an exception will fly.
Throws:
NullPointerException - if the data location is not set
IOException - if an IO Error occurred when opening the stream

setTableName

public void setTableName(String newName)
Sets a new name for the table created by this node.

Parameters:
newName - the new name to set. Valid names are not null.

getTableName

public String getTableName()
Returns:
the currently set name of the table created by this node. Valid names are not null, but the method could return null, if no name was set yet.

setFileHasColumnHeaders

public void setFileHasColumnHeaders(boolean flag)
Tells whether the first line in the file should be considered column headers, or not.

Parameters:
flag - if true the first line in the file will not be considered data, but either ignored or used as column headers, depending on the column headers set (or not) in this object.

getFileHasColumnHeaders

public boolean getFileHasColumnHeaders()
Returns:
a flag telling if the first line in the file will not be considered data, but either ignored or used as column headers, depending on the column headers set (or not) in this object.

setFileHasRowHeaders

public void setFileHasRowHeaders(boolean flag)
Tells whether the first token in each line in the file should be considered row header, or not.

Parameters:
flag - if true the first item in each line in the file will not be considered data, but either ignored or used as row header, depending on the row header prefix set (or not) in this object.

getFileHasRowHeaders

public boolean getFileHasRowHeaders()
Returns:
a flag telling if the first item in each line in the file will not be considered data, but either ignored or used as row header, depending on the row header prefix set (or not) in this object.

setRowHeaderPrefix

public void setRowHeaderPrefix(String rowPrefix)
Set a string that will be used as a prefix for each row header. The header generated will have the row number added to the prefix. This prefix - if set - will be used, regardless of any row header read from the file - if there is any.

Parameters:
rowPrefix - the string that will be used to construct the header for each row. The actual row header will have the row number added. Specify null to clear the prefix.

getRowHeaderPrefix

public String getRowHeaderPrefix()
Returns:
the string that will be used to construct the header for each row. The actual row header will have the row number added. If this returns null, the row header from the file will be used - if any, otherwise the DEF_ROWPREFIX.

uniquifyRowIDs

public boolean uniquifyRowIDs()
Returns:
true if the reader should make rowIDs read from file unique.

setUniquifyRowIDs

public void setUniquifyRowIDs(boolean uniquify)
Parameters:
uniquify - the new value of the "uniquify row IDs from file" flag.

addRowDelimiter

public void addRowDelimiter(String rowDelimPattern,
                            boolean skipEmptyRows)
Will add a delimiter pattern that will terminate a row. Row delimiters are always token (=column) delimiters. Row delimiters will always be returned as separate token by the filereader. You can define a row delimiter that was previously defined a token delimiter. But only, if the delimiter was not set to be included in the token. Otherwise you will get a IllegalArgumentException.

Parameters:
rowDelimPattern - the row delimiter pattern. Row delimiters will always be token delimiters and will always be returned as separate token.
skipEmptyRows - if set true, multiple consecutive row delimiters will be combined and returned as one

removeRowDelimiter

public Delimiter removeRowDelimiter(String pattern)
Removes the row delimiter with the specified pattern. Even though the above method changes an existing column delimiter to being a row delim, this function completely deletes the row delimiter (instead of being aware that it might have been a col delim before and changing it back to a col delim).

Parameters:
pattern - the row delimiter to delete must not be null. null is always a row delimiter.
Returns:
a delimiter object specifying the deleted delimiter, or null if no row delimiter with the pattern existed

removeAllRowDelimiters

public void removeAllRowDelimiters()
Blows away all defined row delimiters! After a call to this function no row delimiter will be defined (except null).


isRowDelimiter

public boolean isRowDelimiter(String pattern)
Parameters:
pattern - the pattern to test
Returns:
true if the pattern is a row delimiter. null is always a row delimiter.

removeAllDelimiters

public void removeAllDelimiters()
Removes all (!) delimiters from the file reader settings. Not a single delimiter will be defined after a call to this method.

Overrides:
removeAllDelimiters in class TokenizerSettings

removeDelimiterPattern

public Delimiter removeDelimiterPattern(String pattern)
Removes the Delimiter object with the specified pattern from the list of defined delimiters. Returns the removed delimiter object, if it existed or null if the pattern didn't exist.

Overrides:
removeDelimiterPattern in class TokenizerSettings
Parameters:
pattern - the delimiter to remove
Returns:
the delimiter object removed, or null if no delimiter with the specified pattern existed.

getIgnoreEmtpyLines

public boolean getIgnoreEmtpyLines()
Returns:
true if the file reader ignores empty lines

setIgnoreEmptyLines

public void setIgnoreEmptyLines(boolean ignoreEm)
Parameters:
ignoreEm - pass true to have the file reader not return empty lines from the data file

combinesMultipleRowDelimiters

public boolean combinesMultipleRowDelimiters(String pattern)
Returns true if the file reader combines multiple consecutive row delimiters with this pattern (i.e. it skips empty rows if it finds multiple if these (and only these) row delimiters). The method throws an IllegalArgumentException at you if the specified pattern is not a row delimiter.

Parameters:
pattern - the pattern to test for
Returns:
true if the filereader skips empty rows for this row delimiter

setMissingValueForColumn

public void setMissingValueForColumn(int colIdx,
                                     String pattern)
Specifies a pattern that, if read in for the specified column, will be considered placeholder for a missing value, and the data table will contain a missing cell instead of that value then.

Parameters:
colIdx - the index of the column this missing value is set for
pattern - the pattern specifying the missing value in the data file for the specified column. Can be null to delete a previously set pattern.

getMissingValueOfColumn

public String getMissingValueOfColumn(int colIdx)
Returns the pattern that, if read in for the specified column, will be considered placeholder for a missing value, and the data table will contain a missing cell instead of that value then.

Parameters:
colIdx - the index of the column the missing value is asked for
Returns:
the pattern that will be considered placeholder for a missing value in the specified column. Or null if no patern is set for that column.

getDecimalSeparator

public char getDecimalSeparator()
Returns:
the character that is considered decimal separator in the data (token) for a double type column

setDecimalSeparator

public void setDecimalSeparator(char sep)
Sets the character that will be considered decimal separator in the data (token) read for double type columns.

Parameters:
sep - the new decimal character to set for doubles. Can't be the same character as the thousands separator.

getThousandsSeparator

public char getThousandsSeparator()
Returns:
the thousandsSeparator. If it is '\0' then it is not set.

setThousandsSeparator

public void setThousandsSeparator(char thousandsSeparator)
Parameters:
thousandsSeparator - the thousandsSeparator to set. If set to '\0' it will not be applied. Can't be the same as the decimal separator.

ignoreEmptyTokensAtEndOfRow

public boolean ignoreEmptyTokensAtEndOfRow()
Returns:
true if additional empty tokens should be ignored at the end of a row (if they are not needed to build the row)

setIgnoreEmptyTokensAtEndOfRow

public void setIgnoreEmptyTokensAtEndOfRow(boolean ignoreThem)
Sets this flag.

Parameters:
ignoreThem - if true, additional empty tokens will be ignored at the end of a row (if they are not needed to build the row)

setSupportShortLines

public void setSupportShortLines(boolean supportShortLines)
Parameters:
supportShortLines - if set true lines with too few data elements will be accepted and filled with missing values.

getSupportShortLines

public boolean getSupportShortLines()
Returns:
true, if lines with too few data items are accepted (they will be filled with missing values, if read), or false, it the reader fails when it comes across a short line (the default).

setMissValuePatternStrCols

public void setMissValuePatternStrCols(String pattern)
Sets a new pattern which is translated into a missing value if read from the data file in a string column. Is is used only for columns that don't have their own missing value pattern set (and that are of type string).

Parameters:
pattern - the new pattern to recognize missing values in string columns. Set to null to clear it.

getMissValuePatternStrCols

public String getMissValuePatternStrCols()
Returns the pattern that, if read in, will be translated into a missing value (in string columns only). It is overridden by the column specific missing value pattern. If it is not defined, null is returned.

Returns:
the pattern for missing values, for all string columns. Or null if not defined.

getColumnNumDeterminingLineNumber

public int getColumnNumDeterminingLineNumber()
Returns:
the line number in the file that determined the number of columns. Or -1 if not set yet (or no file analysis took place).

setColumnNumDeterminingLineNumber

public void setColumnNumDeterminingLineNumber(int lineNumber)
Sets the line number in the file that determined the number of columns.

Parameters:
lineNumber - the line number in the file that determined the number of columns.

getMaximumNumberOfRowsToRead

public long getMaximumNumberOfRowsToRead()
Returns:
the maximum number of lines that should be read from the source. If -1 is returned all rows should be read.

setMaximumNumberOfRowsToRead

public void setMaximumNumberOfRowsToRead(long maxNum)
Sets a new maximum for the number of rows to read.

Parameters:
maxNum - the new maximum. If set to -1 all rows of the source will be read, otherwise no more than the specified number.

getStatusOfSettings

public SettingsStatus getStatusOfSettings(boolean openDataFile,
                                          DataTableSpec tableSpec)
Method to check consistency and completeness of the current settings. It will return a SettingsStatus object which contains info, warning and error messages. Or if the settings are alright it will return null.

Parameters:
openDataFile - tells whether or not this method should try to access the data file. This will - if set true - verify the accessibility of the data.
tableSpec - the spec of the DataTable these settings are for. If set null only a few checks will be performed - the ones that are possible without the knowledge of the structure of the table
Returns:
a SettingsStatus object containing info, warning and error messages, or null if no messages were generated (i.e. all settings are just fine).

getStatusOfSettings

public SettingsStatus getStatusOfSettings()
Method to check consistency and completeness of the current settings. It will return a SettingsStatus object which contains info, warning and error messages, if something is fishy with the settings.

Overrides:
getStatusOfSettings in class TokenizerSettings
Returns:
a SettingsStatus object containing info, warning and error messages - or not if all settings are good.

addStatusOfSettings

protected void addStatusOfSettings(SettingsStatus status,
                                   boolean openDataFile,
                                   DataTableSpec tableSpec)
Adds its status messages to a passed status object.

Parameters:
status - the object to add messages to - if any
openDataFile - specifies if we should check the accessability of the data file
tableSpec - the spec of the DataTable these settings are for. If set null only a few checks will be performed - the ones that are possible without the knowledge of the structure of the table

toString

public String toString()

Overrides:
toString in class TokenizerSettings


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.