There are various settings to specify the format of the data file, all stored in the {@link org.knime.base.node.io.filereader.FileReaderNodeSettings} object. They will be set in the file reader's dialog. In there the {@link org.knime.base.node.io.filereader.FileAnalyzer} does the job of guessing the settings by looking at the first couple of thousands of lines of the file.
The node provides a {@link org.knime.base.node.io.filereader.FileTable} at its output port. The actual job of reading in the file is done in the {@link org.knime.base.node.io.filereader.FileRowIterator}. It reads in the data as requested, line by line, as specified by the settings.
To During the execution of the node it reads through the entire file
once. The reason for that is, the row iterator fails if an unexpected data is
read. Unexpected data is for example an invalid number, or something it cannot
really deal with. If the filereader would not traverse the entire file once, the
row iterator would fail at some later time, when a successor node is executing -
and it would be very hard for the user to relate the failure of a successor node
to a problem during file reading. Another nice side effect of this is, the
filereader can provide a {@link org.knime.core.data.DataTableSpec} (after
execution) with domain information (like possible values or value ranges) filled
in. The FileTable
requires a XML file that specifies the location
and structure of the data to read (see xml package and FileTableSpec.dtd). A
valid URL
of this XML file has to be passed to the FileTable
constructor. Another constructor will accept a FileTableSpec
.
A FileTableSpec
(see FileTableSpec.java)
contains a DataTableSpec
(see data package). An URL
of
a XML file must be provided to the constructor. The FileTableSpec
object will read in the XML file during construction and extract the table
structure from there, without reading from the actual data location.
The actual job of reading the data from the source is done by the RowIterator
(see FileRowIterator.java). It uses the tokenizer (see FileTokenizer
)
to split the stream into columns - the behaviour of the tokenizer must be
specified in the XML file passed to the FileTable
constructor.
Also contains the implementation of a node for the workflow which allows
reading data from a location specified by an URL.
The node makes use of the filereader in the data
package (see knime.data.filereader
).
This node has one output providing the DataTable
read from the
specified source during execution. The node also provides a HiLiteHandler at
this output which was freshly instantiated in the node.
The node is instantiated at the start of the data flow whenever data should be
read from a file or location. It reads in an XML file from the specified
location which in turn defines the URL of the data to read and the format of the
data.