org.knime.core.util.tokenizer
Class TokenizerSettings

java.lang.Object
  extended by org.knime.core.util.tokenizer.TokenizerSettings
Direct Known Subclasses:
FileReaderSettings

public class TokenizerSettings
extends Object

Defines the object holding the configuration for the FileTokenizer.
Use an instance of this class to set all parameters and pass it to a FileTokenizer. This object is used as a transport vehicle to first try setting new user configurations and, if everything went fine (i.e. without any exception), transporting them into the file tokenizer. This class is used in both directions - to get current tokenizer settings, and to set a new configuration in the tokenizer. The methods with default permissions are only used by the file tokenizer to set its current settings in this object - any object user outside the package will retrieve them then through the get-methods. While new user settings will be implanted from the out-of-package world with the set-methods.

Author:
ohl, University of Konstanz
See Also:
Tokenizer

Constructor Summary
TokenizerSettings()
          Creates a new Settings for FileTokenizer object with default settings.
TokenizerSettings(NodeSettingsRO settings)
          Creates a new FileTokenizerSettings object and sets its parameters from the config object.
TokenizerSettings(TokenizerSettings clonee)
          Creates a clone of the passed object.
 
Method Summary
 void addBlockCommentPattern(String commentBegin, String commentEnd, boolean returnAsSeparateToken, boolean includeInToken)
          Adds support for block comment to the tokenizer.
protected  void addDelimiterPattern(Delimiter delimiter)
          Adds a new delimiter pattern expecting a Delimiter object.
 void addDelimiterPattern(String delimiter, boolean combineConsecutiveDelims, boolean returnAsSeparateToken, boolean includeInToken)
          Adds a delimiter to the tokenizer.
 boolean addOrReplaceDelimiterPattern(String delimiter, boolean combineConsecutiveDelims, boolean returnAsSeparateToken, boolean includeInToken)
          Replaces the delimiter with the same delimiter pattern overriding the values for combineConsecutiveDelims, returnAsSeparateToken, and includeInToken.
 void addQuotePattern(String leftQuote, String rightQuote)
           
 void addQuotePattern(String leftQuote, String rightQuote, boolean dontRemoveQuotes)
           
 void addQuotePattern(String leftQuote, String rightQuote, char escapeChar)
          Adds support for the specified quote patterns and escape character.
 void addQuotePattern(String leftQuote, String rightQuote, char escapeChar, boolean dontRemoveQuotes)
           
 void addSingleLineCommentPattern(String commentBegin, boolean returnAsSeparateToken, boolean includeInToken)
          Adds support for single line comment to the tokenizer.
protected  void addStatusOfSettings(SettingsStatus status)
          Checks the completeness and consistency of all settings and adds informational messages, warnings, and errors, if something is suspicious.
 void addWhiteSpaceCharacter(char w)
          This is a convenience method.
 void addWhiteSpaceCharacter(String ws)
          Defines a new character to be handled as a whitespace character.
 Vector<Comment> getAllComments()
           
 Vector<Delimiter> getAllDelimiters()
           
 Vector<Quote> getAllQuotes()
           
 Vector<String> getAllWhiteSpaces()
           
 boolean getCombineMultipleDelimiters()
           
 Delimiter getDelimiterPattern(String delimPattern)
          Returns the Delimiter object stored for the delimiter with the pattern specified.
 String getLineContinuationCharacter()
          Returns a string with one character containing the line continuation character that is currently set - or null if none is set.
 SettingsStatus getStatusOfSettings()
          Method to check consistency and completeness of the current settings.
static String printableStr(String str)
           
 void removeAllComments()
          Removes all (!) comments from the tokenier settings.
 void removeAllDelimiters()
          Removes all (!) delimiters from the file reader settings.
 void removeAllQuotes()
          Removes all (!) quotes from the file reader settings.
 void removeAllWhiteSpaces()
          removes all user defined whitespaces.
 Delimiter removeDelimiterPattern(String pattern)
          Removes the Delimiter object with the specified pattern from the list of defined delimiters.
 Quote removeQuotePattern(String begin, String end)
          Removes the Quote object with the specified patterns from the list of defined quotes.
 void saveToConfiguration(NodeSettingsWO cfg)
          Saves all settings into a NodeSettings object.
 void setCombineMultipleDelimiters(boolean value)
          if set true multiple different (but consecutive) delimiters are combined, that is ignored (unless they are supposed to be returned).
(package private)  void setComments(Vector<Comment> comments)
          sets comment objects to the settings structure.
(package private)  void setDelimiters(Vector<Delimiter> delimiters)
          sets delimiter objects to the settings structure.
 void setLineContinuationCharacter(char c)
          Adds support for line continuation in tokens and quoted strings.
(package private)  void setQuotes(Vector<Quote> quotes)
          sets quote objects to the settings structure.
(package private)  void setWhiteSpaces(Vector<String> whites)
          sets whitespaces to the settings structure.
 String toString()
          
static String unescapeString(String str)
          takes a string that could contain "\t", or "\n", or "\\", and returns a corresponding string with these patterns replaced by the characters '\t', '\n', '\'.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TokenizerSettings

public TokenizerSettings()
Creates a new Settings for FileTokenizer object with default settings.

See Also:
for description of default settings.

TokenizerSettings

public TokenizerSettings(TokenizerSettings clonee)
Creates a clone of the passed object.

Parameters:
clonee - the object to read the settings from.

TokenizerSettings

public TokenizerSettings(NodeSettingsRO settings)
                  throws InvalidSettingsException
Creates a new FileTokenizerSettings object and sets its parameters from the config object. If config doesn't contain all necessary parameters or contains inconsistent settings it will throw an InvalidArguments exception

Parameters:
settings - an object the parameters are read from, if null default settings will be created.
Throws:
InvalidSettingsException - if the config is not valid
Method Detail

saveToConfiguration

public void saveToConfiguration(NodeSettingsWO cfg)
Saves all settings into a NodeSettings object. Using the cfg object to construct a new FileTokenizerSettings object should lead to an object identical to this.

Parameters:
cfg - the config object the settings are stored into.

addQuotePattern

public void addQuotePattern(String leftQuote,
                            String rightQuote,
                            char escapeChar)
Adds support for the specified quote patterns and escape character. The tokenizer will treat any string within the specified leftQuote and rightQuote as quoted string, i.e. any token delimiter will not end the token but will be included in the string and no comment will be recognized inside a quoted string. With the escape character it is possible to include special characters (like new line e.g.) or even the right quote pattern in the string. The esc character and the immediate next char will be translated into one new character: %EscChar%+'t' becomes '\t' (Tab), +'n' translates int '\n' (Newline), EscChar+any other char becomes this other character. The escape character cannot be part of the end pattern. A typical call to this function adding support for single quotes would be addQuotePattern("'", "'", "\\"); - support for double quotes addQuotePattern("\"", "\"", "\\"); - both calls also add support for the escape character '\'. If you don't want an escape character, use the next function. The Quote patterns get removed from the token by default. There are methods that take a flag, if you want them to remain in the token.

Parameters:
leftQuote - A string containing the left quote pattern.
rightQuote - A string containing the right quote pattern.
escapeChar - The escape character.

addQuotePattern

public void addQuotePattern(String leftQuote,
                            String rightQuote,
                            char escapeChar,
                            boolean dontRemoveQuotes)
Parameters:
leftQuote - the left quote pattern
rightQuote - the right quote pattern
escapeChar - the escape character inside a quoted text
dontRemoveQuotes - true if quote patterns should stay in the token, false, if they should be removed from the returned token.

addQuotePattern

public void addQuotePattern(String leftQuote,
                            String rightQuote)
Parameters:
leftQuote - The left quot char.
rightQuote - The right quot char.
See Also:
addQuotePattern(String, String, char)

addQuotePattern

public void addQuotePattern(String leftQuote,
                            String rightQuote,
                            boolean dontRemoveQuotes)
Parameters:
leftQuote - the left quote pattern
rightQuote - the right quote pattern
dontRemoveQuotes - true if quote patterns should stay in the token, false, if they should be removed from the returned token.

removeQuotePattern

public Quote removeQuotePattern(String begin,
                                String end)
Removes the Quote object with the specified patterns from the list of defined quotes. Returns the removed quote object, if it existed or null if the patterns didn't match any.

Parameters:
begin - the quote begin pattern to match
end - the quote end pattern to match
Returns:
the quote object removed, or null if no quote with the specified begin and end pattern existed.

removeAllQuotes

public void removeAllQuotes()
Removes all (!) quotes from the file reader settings. Not a single quote will be defined after a call to this method.


addDelimiterPattern

public void addDelimiterPattern(String delimiter,
                                boolean combineConsecutiveDelims,
                                boolean returnAsSeparateToken,
                                boolean includeInToken)
Adds a delimiter to the tokenizer. If a delimiter is read - outside any comment block or quoted string - the characters read before will be returned as token. Depending on the parameters, the delimiter just read will be either appended to the current token ( includeInToken set true), returned in a separate token ( returnAsSeparateToken set true ) or discarded (both set false). If you set both parameters true, it will throw an IllegalArgumentException. Another parameter ( combineConsecutiveDelimis) will determine whether delimiters of the same kind immediately following will be ignored (set to true ) or will cause empty tokens to be returned (set false ). The delimiter specified must not prefix any existing delimiter, left quote or comment begin pattern.

Parameters:
delimiter - A string containing the delimiter.
combineConsecutiveDelims - Pass in true, if you want multiple consecutive delimiters to be treated as one, or false if empty tokens should be returned between them.
returnAsSeparateToken - Set to true to get delimiters returned as tokens, or false if they should be discarded (or included in the tokens - see next parameter). Mutually exclusive with includeInToken.
includeInToken - Set to true if you want the delimiter returned at the end of the token. Otherwise it will be discarded (or returned as separate token, see parameter above). Mutually exclusive with returnAsSeparateToken .

addDelimiterPattern

protected void addDelimiterPattern(Delimiter delimiter)
Adds a new delimiter pattern expecting a Delimiter object. Does all kinds of checkings and throws IllegalArgument exceptions.

Parameters:
delimiter - the delimiter to add.

addOrReplaceDelimiterPattern

public boolean addOrReplaceDelimiterPattern(String delimiter,
                                            boolean combineConsecutiveDelims,
                                            boolean returnAsSeparateToken,
                                            boolean includeInToken)
Replaces the delimiter with the same delimiter pattern overriding the values for combineConsecutiveDelims, returnAsSeparateToken, and includeInToken. It will return true, if everything works fine - false, if it couldn't find a matching delimiter to replace.

Parameters:
delimiter - The pattern matching the delimiter to replace.
combineConsecutiveDelims - New value for this parameter.
returnAsSeparateToken - New value for this parameter.
includeInToken - New value for this parameter.
Returns:
true if it replaced the delimiter or false if it was added.

getDelimiterPattern

public Delimiter getDelimiterPattern(String delimPattern)
Returns the Delimiter object stored for the delimiter with the pattern specified.

Parameters:
delimPattern - the string pattern of the delimiter to look for.
Returns:
the Delimiter object of the specified delimiter pattern, if defined, otherwise null.

removeDelimiterPattern

public Delimiter removeDelimiterPattern(String pattern)
Removes the Delimiter object with the specified pattern from the list of defined delimiters. Returns the removed delimiter object, if it existed or null if the pattern didn't exist.

Parameters:
pattern - the delimiter to remove
Returns:
the delimiter object removed, or null if no delimiter with the specified pattern existed.

removeAllDelimiters

public void removeAllDelimiters()
Removes all (!) delimiters from the file reader settings. Not a single delimiter will be defined after a call to this method.


addBlockCommentPattern

public void addBlockCommentPattern(String commentBegin,
                                   String commentEnd,
                                   boolean returnAsSeparateToken,
                                   boolean includeInToken)
Adds support for block comment to the tokenizer. Everything between the comment begin pattern and the comment end pattern will be ignored, and either returned as separate token (if returnAsSeparateToken is set true), included in the token (if includeInToken is true), or discarded (if both parameters are set false). (If you specify both parameters true it will throw an IllegalArgumentException.)

Parameters:
commentBegin - The string containing a pattern that starts a comment.
commentEnd - The string containing the end pattern of the comment. (Must not be a LF ("\n"). Use the next function for line comment.)
returnAsSeparateToken - Set to true if the comment should be returned in a separate token, or false if it should be discarded, or included in the token (see the following parameter).
includeInToken - Set true if a comment should be returned within the token (at the place where it occured in the stream), of false, if it should be discarded or returned as separate token (depending on the parameter above).

addSingleLineCommentPattern

public void addSingleLineCommentPattern(String commentBegin,
                                        boolean returnAsSeparateToken,
                                        boolean includeInToken)
Adds support for single line comment to the tokenizer. Everything between the comment begin pattern and the next line feed will be ignored, and either returned as separate token (if returnAsSeparateToken is set true), included in the token (if includeInToken is true), or discarded (if both parameters are set false). (If you specify both parameters true it will throw an IllegalArgumentException.)

Parameters:
commentBegin - The string containing a pattern that starts a single line comment.
returnAsSeparateToken - Set to true if the comment should be returned in a separate token, or false if it should be discarded, or included in the token (see the following parameter).
includeInToken - Set true if a comment should be returned within the token (at the place where it occured in the stream), of false, if it should be discarded or returned as separate token (depending on the parameter above).

removeAllComments

public void removeAllComments()
Removes all (!) comments from the tokenier settings. Not a single comment will be defined after a call to this method.


addWhiteSpaceCharacter

public void addWhiteSpaceCharacter(String ws)
Defines a new character to be handled as a whitespace character. Whitespaces will be ignored when they appear in the file (except when inside quotes or defined as delimiter/quote/comment pattern). Any other definition of the same character overrides the whitespace definition, i.e., e.g. if the same character is defined as linecontinuation char it will be treated as such, the whitespace definition of this char will be (silently) ignored.

Parameters:
ws - a one character string containing the new whitespace character

addWhiteSpaceCharacter

public void addWhiteSpaceCharacter(char w)
This is a convenience method. Whitespace characters are handled as one-character strings.

Parameters:
w - character containing the new whitespace character
See Also:
addWhiteSpaceCharacter(String)

removeAllWhiteSpaces

public void removeAllWhiteSpaces()
removes all user defined whitespaces. No whitespaces will be ignored after that.


setLineContinuationCharacter

public void setLineContinuationCharacter(char c)
Adds support for line continuation in tokens and quoted strings. The tokenizer ignores a new line, if the last character in a line was the specified character c (No trailing spaces allowed!) and it will also ignore any space or tab character at the beginning of the new line then.

The following two quoted strings are equivalent if '\' is set as line cont. char: "this is \
considered one line" and "this is considered one line".

Parameters:
c - The new line continuation character.

getLineContinuationCharacter

public String getLineContinuationCharacter()
Returns a string with one character containing the line continuation character that is currently set - or null if none is set.

Returns:
A one-char long string containing the line cont. char, or null if none is set.

setCombineMultipleDelimiters

public void setCombineMultipleDelimiters(boolean value)
if set true multiple different (but consecutive) delimiters are combined, that is ignored (unless they are supposed to be returned).

Parameters:
value - set true to combine multiple different consecutive delimiters, of false to handle each as seperate delimiter.

getCombineMultipleDelimiters

public boolean getCombineMultipleDelimiters()
Returns:
true if multiple consecutive (but different) delimiters are combined as one - or treated each as separate delimiter.

getAllComments

public Vector<Comment> getAllComments()
Returns:
a new vector, with items of type Comment, containing all currently defined comment patterns. Could be emtpy, but never null. The vector is your's if you want it to change.
See Also:
Comment

getAllQuotes

public Vector<Quote> getAllQuotes()
Returns:
a new vector, with items of type Quote, containing all currently defined quote patterns. Could be emtpy, but never null. The vector is your's if you want it to change.
See Also:
Quote

getAllDelimiters

public Vector<Delimiter> getAllDelimiters()
Returns:
a new vector, with items of type Delimiter, containing all currently defined delimiter patterns. Could be emtpy, but never null. The vector is your's if you want it to change.
See Also:
Delimiter

getAllWhiteSpaces

public Vector<String> getAllWhiteSpaces()
Returns:
a new vector of strings, all of length one, containing the characters handled and ignored as whitespaces. The vector is your's.

setComments

void setComments(Vector<Comment> comments)
sets comment objects to the settings structure. No consistency checks will be performed.

Parameters:
comments - a Vector of Comment objects to add. Must not be null.

setQuotes

void setQuotes(Vector<Quote> quotes)
sets quote objects to the settings structure. No consistency checks will be performed.

Parameters:
quotes - a vector of Quote objects to add. Must not be null.

setDelimiters

void setDelimiters(Vector<Delimiter> delimiters)
sets delimiter objects to the settings structure. No consitency checks will be performed.

Parameters:
delimiters - a Vector of delimiter objects to add. Must not be null.

setWhiteSpaces

void setWhiteSpaces(Vector<String> whites)
sets whitespaces to the settings structure. No consistency checks will be performed. Existing whitespaces will be cleared before.

Parameters:
whites - a Vector of one-character strings to set. Must not be null.

toString

public String toString()

Overrides:
toString in class Object

addStatusOfSettings

protected void addStatusOfSettings(SettingsStatus status)
Checks the completeness and consistency of all settings and adds informational messages, warnings, and errors, if something is suspicious.

Parameters:
status - an object this methods adds its messages to.

getStatusOfSettings

public SettingsStatus getStatusOfSettings()
Method to check consistency and completeness of the current settings. It will return a SettingsStatus object which contains info, warning and error messages, if something is fishy with the settings.

Returns:
a SettingsStatus object containing info, warning and error messages - or not if all settings are good.

unescapeString

public static String unescapeString(String str)
takes a string that could contain "\t", or "\n", or "\\", and returns a corresponding string with these patterns replaced by the characters '\t', '\n', '\'.

Parameters:
str - a string with escape sequences in
Returns:
a string with all sequences translated. If there are no esc sequences in the specified string the exact same reference will be returned.

printableStr

public static String printableStr(String str)
Parameters:
str - a string. Could be null.
Returns:
a printable string with all control chars replaced.


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.