org.knime.core.data.container
Class Buffer

java.lang.Object
  extended by org.knime.core.data.container.Buffer
All Implemented Interfaces:
KNIMEStreamConstants
Direct Known Subclasses:
NoKeyBuffer

 class Buffer
extends Object
implements KNIMEStreamConstants

A buffer writes the rows from a DataContainer to a file. This class serves as connector between the DataContainer and the DataTable that is returned by the container. It "centralizes" the IO operations.

Author:
Bernd Wiswedel, University of Konstanz

Nested Class Summary
(package private) static class Buffer.FromFileIterator
          Super class of all file iterators.
 
Field Summary
(package private) static String ZIP_ENTRY_BLOBS
          Name of the zip entry containing the blob files (directory).
(package private) static String ZIP_ENTRY_DATA
          Name of the zip entry containing the data.
(package private) static String ZIP_ENTRY_META
          Name of the zip entry containing the meta information (e.g.
 
Fields inherited from interface org.knime.core.data.container.KNIMEStreamConstants
BYTE_ROW_SEPARATOR, BYTE_TYPE_MISSING, BYTE_TYPE_SERIALIZATION, BYTE_TYPE_START, DUMMY_ROW_KEY, TC_ESCAPE, TC_TERMINATE
 
Constructor Summary
Buffer(File binFile, File blobDir, DataTableSpec spec, InputStream metaIn, int bufferID, Map<Integer,ContainerTable> tblRep)
          Creates new buffer for reading.
Buffer(int maxRowsInMemory, int bufferID, Map<Integer,ContainerTable> globalRep, Map<Integer,ContainerTable> localRep)
          Creates new buffer for writing.
 
Method Summary
(package private)  void addRow(DataRow r, boolean isCopyOfExisting, boolean forceCopyOfBlobs)
          Adds a row to the buffer.
(package private)  void addToZipFile(ZipOutputStream zipOut, ExecutionMonitor exec)
          Method that's been called from the ContainerTable to save the content.
(package private)  void clear()
          Clears the temp file.
(package private)  void clearIteratorInstance(Buffer.FromFileIterator it, boolean removeFromHash)
          Clear the argument iterator (free the allocated resources.
(package private)  void close(DataTableSpec spec)
          Flushes and closes the stream.
(package private)  boolean containsBlobCells()
          True if any row containing blob cells is contained in this buffer.
(package private) static File createBlobDirNameForTemp(File tempFile)
          Guesses a "good" blob directory for a given binary temp file.
(package private)  Buffer createLocalCloneForWriting()
          Creates a clone of this buffer for writing the content to a stream that is of the current version.
protected  void finalize()
          Deletes the file underlying this buffer.
(package private)  File getBinFile()
           
(package private)  File getBlobFile(int indexBlobInCol, int column, boolean createPath, boolean isCompressed)
          Determines the file location for a blob to be read/written with some given coordinates (column and index in column).
(package private)  int getBufferID()
          Get this buffer's ID.
(package private)  Map<Integer,ContainerTable> getGlobalRepository()
          Get reference to the table repository that this buffer was initially instantiated with.
(package private)  Map<Integer,ContainerTable> getLocalRepository()
          Get reference to the local table repository that this buffer was initially instantiated with.
(package private)  int getReadVersion()
          Get underlying stream version.
 DataTableSpec getTableSpec()
          Get the table spec that was set in the constructor.
(package private)  CellClassInfo getTypeForChar(byte identifier)
          Perform lookup for the DataCell class info given the argument byte.
 String getVersion()
          Get the version string to write to the meta file.
(package private)  void incrementSize()
          Increments the row counter by one, used in addRow.
(package private)  boolean isBinFileGZipped()
           
(package private)  CloseableRowIterator iterator()
          Get a new RowIterator, traversing all rows that have been added.
(package private)  BlobDataCell readBlobDataCell(BlobDataCell.BlobAddress blobAddress, CellClassInfo cl)
          Reads the blob from the given blob address.
(package private)  void restoreIntoMemory()
          Restore content of this buffer into main memory (using a collection implementation).
(package private)  boolean shouldSkipRowKey()
          Get whether the buffer wants to persist row keys.
 int size()
          Get the row count.
(package private)  boolean usesOutFile()
          Does the buffer use a file?
 int validateVersion(String version)
          Validate the version as read from the file if it can be parsed by this implementation.
(package private)  void writeDataCell(DataCell cell, DCObjectOutputVersion2 outStream)
          Writes a data cell to the outStream.
(package private)  void writeRowKey(RowKey key, DCObjectOutputVersion2 outStream)
          Writes the row key to the out stream.
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ZIP_ENTRY_DATA

static final String ZIP_ENTRY_DATA
Name of the zip entry containing the data.

See Also:
Constant Field Values

ZIP_ENTRY_BLOBS

static final String ZIP_ENTRY_BLOBS
Name of the zip entry containing the blob files (directory).

See Also:
Constant Field Values

ZIP_ENTRY_META

static final String ZIP_ENTRY_META
Name of the zip entry containing the meta information (e.g. #rows).

See Also:
Constant Field Values
Constructor Detail

Buffer

Buffer(int maxRowsInMemory,
       int bufferID,
       Map<Integer,ContainerTable> globalRep,
       Map<Integer,ContainerTable> localRep)
Creates new buffer for writing. It has assigned a given spec, and a max row count that may resize in memory.

Parameters:
maxRowsInMemory - Maximum numbers of rows that are kept in memory until they will be subsequent written to the temp file. (0 to write immediately to a file)
globalRep - Table repository for blob (de)serialization (read only).
localRep - Local table repository for blob (de)serialization.
bufferID - The id of this buffer used for blob (de)serialization.

Buffer

Buffer(File binFile,
       File blobDir,
       DataTableSpec spec,
       InputStream metaIn,
       int bufferID,
       Map<Integer,ContainerTable> tblRep)
 throws IOException
Creates new buffer for reading. The binFile is the binary file as written by this class, which will be deleted when this buffer is cleared or finalized.

Parameters:
binFile - The binary file to read from (will be deleted on exit).
blobDir - temp directory containing blobs (may be null).
spec - The data table spec to which the this buffer complies to.
metaIn - An input stream from which this constructor reads the meta information (e.g. which byte encodes which DataCell).
bufferID - The id of this buffer used for blob (de)serialization.
tblRep - Table repository for blob (de)serialization.
Throws:
IOException - If the header (the spec information) can't be read.
Method Detail

getVersion

public String getVersion()
Get the version string to write to the meta file. This method is overridden in the NoKeyBuffer to distinguish streams written by the different implementations.

Returns:
The version string.

getReadVersion

final int getReadVersion()
Get underlying stream version. Important for file iterators.

Returns:
Underlying stream version.

getBinFile

final File getBinFile()
Returns:
Underlying binary file.

isBinFileGZipped

final boolean isBinFileGZipped()
Returns:
Whether stream is zipped.

validateVersion

public int validateVersion(String version)
                    throws IOException
Validate the version as read from the file if it can be parsed by this implementation.

Parameters:
version - As read from file.
Returns:
The version ID for internal use.
Throws:
IOException - If it can't be parsed.

addRow

void addRow(DataRow r,
            boolean isCopyOfExisting,
            boolean forceCopyOfBlobs)
Adds a row to the buffer. The rows structure is not validated against the table spec that was given in the constructor. This should have been done in the caller class DataContainer.

Parameters:
r - The row to be added.
isCopyOfExisting - Whether to copy blobs (this is only true when and existing buffer gets copied (version hop))
forceCopyOfBlobs - If true any blob that is not owned by this buffer, will be copied and this buffer will take ownership. This option is true for loop end nodes, which need to aggregate the data generated in the loop body

incrementSize

void incrementSize()
Increments the row counter by one, used in addRow.


close

void close(DataTableSpec spec)
Flushes and closes the stream. If no file has been created and therefore everything fits in memory (according to the settings in the constructor), it will stay in memory (no file created).

Parameters:
spec - The spec the rows have to follow. No sanity check is done.

usesOutFile

boolean usesOutFile()
Does the buffer use a file?

Returns:
true If it does.

getTableSpec

public DataTableSpec getTableSpec()
Get the table spec that was set in the constructor.

Returns:
The spec the buffer uses.

size

public int size()
Get the row count.

Returns:
How often has addRow() been called.

shouldSkipRowKey

boolean shouldSkipRowKey()
Get whether the buffer wants to persist row keys. Here hard-coded to true but overwritten in NoKeyBuffer.

Returns:
whether row keys needs to be written/read.

restoreIntoMemory

final void restoreIntoMemory()
Restore content of this buffer into main memory (using a collection implementation). The restoring will be performed with the next iteration.


getGlobalRepository

Map<Integer,ContainerTable> getGlobalRepository()
Get reference to the table repository that this buffer was initially instantiated with. Used for blob reading/writing.

Returns:
(Worflow-) global table repository.

getLocalRepository

Map<Integer,ContainerTable> getLocalRepository()
Get reference to the local table repository that this buffer was initially instantiated with. Used for blob reading/writing. This may be null.

Returns:
(Worflow-) global table repository.

writeRowKey

void writeRowKey(RowKey key,
                 DCObjectOutputVersion2 outStream)
           throws IOException
Writes the row key to the out stream. This method is overridden in NoKeyBuffer in order to skip the row key.

Parameters:
key - The key to write.
outStream - To write to.
Throws:
IOException - If that fails.

writeDataCell

void writeDataCell(DataCell cell,
                   DCObjectOutputVersion2 outStream)
             throws IOException
Writes a data cell to the outStream.

Parameters:
cell - The cell to write.
outStream - To write to.
Throws:
IOException - If stream corruption happens.

readBlobDataCell

BlobDataCell readBlobDataCell(BlobDataCell.BlobAddress blobAddress,
                              CellClassInfo cl)
                        throws IOException
Reads the blob from the given blob address.

Parameters:
blobAddress - The address to read from.
cl - The expected class.
Returns:
The blob cell being read.
Throws:
IOException - If that fails.

getTypeForChar

CellClassInfo getTypeForChar(byte identifier)
                       throws IOException
Perform lookup for the DataCell class info given the argument byte.

Parameters:
identifier - The byte as read from the stream.
Returns:
the associated cell class info
Throws:
IOException - If the byte is invalid.

createBlobDirNameForTemp

static File createBlobDirNameForTemp(File tempFile)
Guesses a "good" blob directory for a given binary temp file. For instance, if the temp file is /tmp/knime_container_xxxx_xx.bin.gz, the blob dir name is suggested to be /tmp/knime_container_xxxx_xx.

Parameters:
tempFile - base name
Returns:
proposed temp file

getBlobFile

File getBlobFile(int indexBlobInCol,
                 int column,
                 boolean createPath,
                 boolean isCompressed)
           throws IOException
Determines the file location for a blob to be read/written with some given coordinates (column and index in column).

Parameters:
indexBlobInCol - The index in the column (generally the row number).
column - The column index.
createPath - Create the directory, if necessary (when writing)
isCompressed - If file is (to be) compressed
Returns:
The file location.
Throws:
IOException - If that fails (e.g. blob dir does not exist).

iterator

CloseableRowIterator iterator()
Get a new RowIterator, traversing all rows that have been added. Calling this method makes only sense when the buffer has been closed. However, no check is done (as it is available to package classes only).

Returns:
a new Iterator over all rows.

containsBlobCells

boolean containsBlobCells()
True if any row containing blob cells is contained in this buffer.

Returns:
if blob cells are present.

createLocalCloneForWriting

Buffer createLocalCloneForWriting()
Creates a clone of this buffer for writing the content to a stream that is of the current version.

Returns:
A new buffer with the same ID, which is only used locally to update the stream.

addToZipFile

void addToZipFile(ZipOutputStream zipOut,
                  ExecutionMonitor exec)
            throws IOException,
                   CanceledExecutionException
Method that's been called from the ContainerTable to save the content. It will add zip entries to the zipOut argument and not close the output stream when done, allowing to add additional content elsewhere (for instance the DataTableSpec).

Parameters:
zipOut - To write to.
exec - For progress/cancel
Throws:
IOException - If it fails to write to a file.
CanceledExecutionException - If canceled.
See Also:
#saveToFile(File, NodeSettingsWO, ExecutionMonitor)

finalize

protected void finalize()
Deletes the file underlying this buffer.

Overrides:
finalize in class Object
See Also:
Object.finalize()

getBufferID

int getBufferID()
Get this buffer's ID. It may be null if this buffer is not used as part of the workflow (but rather just has been read/written from/to a zip file.

Returns:
the buffer ID or -1

clearIteratorInstance

void clearIteratorInstance(Buffer.FromFileIterator it,
                           boolean removeFromHash)
Clear the argument iterator (free the allocated resources.

Parameters:
it - The iterator
removeFromHash - Whether to remove from global hash.

clear

void clear()
Clears the temp file. Any subsequent iteration will fail!



Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.