|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.knime.core.data.container.DataContainer
public class DataContainer
Buffer that collects DataRow
objects and creates a
DataTable
on request. This data structure is useful if the
number of rows is not known in advance.
Usage: Create a container with a given spec (matching the rows being added
later on, add the data using the
addRowToTable(DataRow)
method and finally close it with
close()
. You can access the table by getTable()
.
Note regarding the column domain: This implementation updates the column
domain while new rows are added to the table. It will keep the lower and
upper bound for all columns that are numeric, i.e. whose column type is
a sub type of DoubleCell.TYPE
. For categorical columns,
it will keep the list of possible values if the number of different values
does not exceed 60. (If there are more, the values are forgotten and
therefore not available in the final table.) A categorical column is
a column whose type is a sub type of StringCell.TYPE
,
i.e. StringCell.TYPE.isSuperTypeOf(yourtype)
where
yourtype is the given column type.
Nested Class Summary | |
---|---|
(package private) static class |
DataContainer.BufferCreator
Helper class to create a Buffer instance given a binary file and the data table spec. |
Field Summary | |
---|---|
(package private) static int |
ASYNC_CACHE_SIZE
Size of buffers. |
(package private) static String |
CFG_TABLESPEC
Used in write/readFromZip: Config entry: The spec of the table. |
static boolean |
DEF_GZIP_COMPRESSION
Whether compression is enabled by default. |
static int |
DEF_MAX_CELLS_IN_MEMORY
The default number of cells to be held in memory. |
static int |
MAX_CELLS_IN_MEMORY
Number of cells that are cached without being written to the temp file (see Buffer implementation); It defaults to the value defined by DEF_MAX_CELLS_IN_MEMORY but can be changed
using the java property PROPERTY_CELLS_IN_MEMORY . |
static String |
PROPERTY_CELLS_IN_MEMORY
Java property name to set a different threshold for the number of cells to be held in main memory. |
(package private) static boolean |
SYNCHRONOUS_IO
Whether to use synchronous IO while adding rows to a buffer or reading from an file iterator. |
(package private) static String |
ZIP_ENTRY_SPEC
Used in write/readFromZip: Name of the zip entry containing the spec. |
Constructor Summary | |
---|---|
DataContainer(DataTableSpec spec)
Opens the container so that rows can be added by addRowToTable(DataRow) . |
|
DataContainer(DataTableSpec spec,
boolean initDomain)
Opens the container so that rows can be added by addRowToTable(DataRow) . |
|
DataContainer(DataTableSpec spec,
boolean initDomain,
int maxCellsInMemory)
Opens the container so that rows can be added by addRowToTable(DataRow) . |
Method Summary | |
---|---|
protected void |
addRowKeyForDuplicateCheck(RowKey key)
Method being called when addRowToTable(DataRow) is called. |
void |
addRowToTable(DataRow row)
Appends a row to the end of a container. |
static DataTable |
cache(DataTable table,
ExecutionMonitor exec)
Convenience method that will buffer the entire argument table. |
static DataTable |
cache(DataTable table,
ExecutionMonitor exec,
int maxCellsInMemory)
Convenience method that will buffer the entire argument table. |
void |
close()
Closes container and creates table that can be accessed by getTable() . |
protected int |
createInternalBufferID()
Get an internal id for the buffer being used. |
static File |
createTempFile()
Creates a temp file called "knime_container_date_xxxx.zip" and marks it for deletion upon exit. |
protected ContainerTable |
getBufferedTable()
Returns the table holding the data. |
protected Map<Integer,ContainerTable> |
getGlobalTableRepository()
Get the map of buffers that potentially have written blob objects. |
protected Map<Integer,ContainerTable> |
getLocalTableRepository()
Get the local repository. |
DataTable |
getTable()
Get reference to table. |
DataTableSpec |
getTableSpec()
Get the currently set DataTableSpec. |
boolean |
isClosed()
Returns true if table has been closed and
getTable() will return a DataTable object. |
static boolean |
isContainerTable(DataTable table)
Returns true if the given argument table has been created
by the DataContainer, false otherwise. |
protected boolean |
isForceCopyOfBlobs()
Get the property, which has possibly been set by setForceCopyOfBlobs(boolean) . |
boolean |
isOpen()
Returns true if the container has been initialized with
DataTableSpec and is ready to accept rows. |
static ContainerTable |
readFromStream(InputStream in)
Reads a table from an input stream. |
static ContainerTable |
readFromZip(File zipFile)
Reads a table from a zip file that has been written using the writeToZip(DataTable, File, ExecutionMonitor) method. |
(package private) static ContainerTable |
readFromZip(ReferencedFile zipFileRef,
DataContainer.BufferCreator creator)
Factory method used to restore table from zip file. |
(package private) static ContainerTable |
readFromZipDelayed(CopyOnAccessTask c,
DataTableSpec spec)
Used in BufferedDataContainer to read the
tables from the workspace location. |
protected static ContainerTable |
readFromZipDelayed(ReferencedFile zipFile,
DataTableSpec spec,
int bufferID,
Map<Integer,ContainerTable> bufferRep)
Used in BufferedDataContainer to read
the tables from the workspace location. |
protected void |
setBufferCreator(DataContainer.BufferCreator bufferCreator)
Set a buffer creator to be used to initialize the buffer. |
protected void |
setForceCopyOfBlobs(boolean forceCopyOfBlobs)
If true any blob that is not owned by this container, will be copied and this container will take ownership. |
void |
setMaxPossibleValues(int maxPossibleValues)
Define a new threshold for number of possible values to memorize. |
int |
size()
Get the number of rows that have been added so far. |
static void |
writeToStream(DataTable table,
OutputStream out,
ExecutionMonitor exec)
Writes a given DataTable permanently to an output stream. |
static void |
writeToZip(DataTable table,
File zipFile,
ExecutionMonitor exec)
Writes a given DataTable permanently to a zip file. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final boolean DEF_GZIP_COMPRESSION
KNIMEConstants.PROPERTY_TABLE_GZIP_COMPRESSION
,
Constant Field Valuespublic static final String PROPERTY_CELLS_IN_MEMORY
-Dorg.knime.container.cellsinmemory=1000
to the knime.ini
file in the installation directory.
public static final int DEF_MAX_CELLS_IN_MEMORY
public static final int MAX_CELLS_IN_MEMORY
DEF_MAX_CELLS_IN_MEMORY
but can be changed
using the java property PROPERTY_CELLS_IN_MEMORY
.
static final int ASYNC_CACHE_SIZE
static final boolean SYNCHRONOUS_IO
false
but can be
enabled by setting the appropriate java property at startup.
static final String ZIP_ENTRY_SPEC
static final String CFG_TABLESPEC
Constructor Detail |
---|
public DataContainer(DataTableSpec spec)
addRowToTable(DataRow)
. The table spec of the resulting
table (the one being returned by getTable()
) will have a
valid column domain. That means, while rows are added to the container,
the domain of each column is adjusted.
If you prefer to stick with the domain as passed in the argument, use the
constructor DataContainer(DataTableSpec, true,
DataContainer.MAX_CELLS_IN_MEMORY)
instead.
spec
- Table spec of the final table. Rows that are added to the
container must comply with this spec.
NullPointerException
- If spec
is null
.public DataContainer(DataTableSpec spec, boolean initDomain)
addRowToTable(DataRow)
.
spec
- Table spec of the final table. Rows that are added to the
container must comply with this spec.initDomain
- if set to true, the column domains in the
container are initialized with the domains from spec.
NullPointerException
- If spec
is null
.public DataContainer(DataTableSpec spec, boolean initDomain, int maxCellsInMemory)
addRowToTable(DataRow)
.
spec
- Table spec of the final table. Rows that are added to the
container must comply with this spec.initDomain
- if set to true, the column domains in the
container are initialized with the domains from spec.maxCellsInMemory
- Maximum count of cells in memory before swapping.
IllegalArgumentException
- If maxCellsInMemory
< 0.
NullPointerException
- If spec
is null
.Method Detail |
---|
protected void setBufferCreator(DataContainer.BufferCreator bufferCreator)
bufferCreator
- To be used.
NullPointerException
- If the argument is null
.
IllegalStateException
- If the buffer has already been created.protected final void setForceCopyOfBlobs(boolean forceCopyOfBlobs)
forceCopyOfBlobs
- this above described property
IllegalStateException
- If this buffer has already added rows,
i.e. this method must be called right after construction.protected final boolean isForceCopyOfBlobs()
setForceCopyOfBlobs(boolean)
.
public void setMaxPossibleValues(int maxPossibleValues)
maxPossibleValues
- The new number.
IllegalArgumentException
- If the value < 0public boolean isOpen()
true
if the container has been initialized with
DataTableSpec
and is ready to accept rows.
This implementation returns !isClosed()
;
true
if container is accepting rows.public boolean isClosed()
true
if table has been closed and
getTable()
will return a DataTable
object.
true
if table is available, false
otherwise.public void close()
getTable()
. Successive calls of addRowToTable
will fail with an exception.
IllegalStateException
- If container is not open.
DuplicateKeyException
- If the final check for duplicate row
keys fails.
DataContainerException
- If the duplicate check fails for an
unknown IO problempublic int size()
addRowToTable
been called.)
IllegalStateException
- If container is not open.public DataTable getTable()
IllegalStateException
- If isClosed()
returns
false
protected final ContainerTable getBufferedTable()
IllegalStateException
- If isClosed()
returns
false
public DataTableSpec getTableSpec()
public void addRowToTable(DataRow row)
DataTableSpec
that has been set when
the container or table has been constructed.
addRowToTable
in interface RowAppender
row
- DataRow
to be addedprotected int createInternalBufferID()
An ID of -1 denotes the fact, that the buffer is not intended to be used for sophisticated blob serialization. All blob cells that are added to it will be newly serialized as if they were created for the first time.
This implementation returns -1.
protected void addRowKeyForDuplicateCheck(RowKey key)
addRowToTable(DataRow)
is called. This
method will add the given row key to the internal row key hashing
structure, which allows for duplicate checking.
This method may be overridden to disable duplicate checks. The overriding class must ensure that there are no duplicates being added whatsoever.
key
- Key being added. This implementation extracts the string
representation from it and adds it to an internal
DuplicateChecker
instance.
DataContainerException
- This implementation may throw a
DataContainerException
when
DuplicateChecker.addKey(String)
throws an IOException
.
DuplicateKeyException
- If a duplicate is encountered.protected Map<Integer,ContainerTable> getGlobalTableRepository()
If used along with the ExecutionContext
,
this method returns the global table repository (global = in the context
of the current workflow).
This implementation does not support sophisticated blob serialization.
It will return a new HashMap<Integer, Buffer>()
.
getLocalTableRepository()
protected Map<Integer,ContainerTable> getLocalTableRepository()
BufferedDataContainer
public static DataTable cache(DataTable table, ExecutionMonitor exec, int maxCellsInMemory) throws CanceledExecutionException
table
- The table to cache.exec
- The execution monitor to report progress to and to check
for the cancel status.maxCellsInMemory
- The number of cells to be kept in memory before
swapping to disk.
NullPointerException
- If the argument is null
.
CanceledExecutionException
- If the process has been canceled.public static DataTable cache(DataTable table, ExecutionMonitor exec) throws CanceledExecutionException
table
- The table to cache.exec
- The execution monitor to report progress to and to check
for the cancel status.
NullPointerException
- If the argument is null
.
CanceledExecutionException
- If the process has been canceled.public static void writeToZip(DataTable table, File zipFile, ExecutionMonitor exec) throws IOException, CanceledExecutionException
table
- The table to write.zipFile
- The file to write to. Will be created or overwritten.exec
- For progress info.
IOException
- If writing fails.
CanceledExecutionException
- If canceled.readFromZip(File)
public static void writeToStream(DataTable table, OutputStream out, ExecutionMonitor exec) throws IOException, CanceledExecutionException
The content is saved by instantiating a ZipOutputStream
on
the argument stream, saving the necessary information in respective
zip entries and eventually closing the entire stream. If the stream
should not be closed, consider to use a NonClosableOutputStream
as argument stream.
table
- The table to write.out
- The stream to save to.exec
- For progress info.
IOException
- If writing fails.
CanceledExecutionException
- If canceled.readFromStream(InputStream)
public static ContainerTable readFromZip(File zipFile) throws IOException
writeToZip(DataTable, File, ExecutionMonitor)
method.
zipFile
- To read from.
IOException
- If that fails.writeToZip(DataTable, File, ExecutionMonitor)
public static ContainerTable readFromStream(InputStream in) throws IOException
writeToStream(DataTable, OutputStream, ExecutionMonitor)
.
The argument stream will be closed. If this is not desired, consider
to use a NonClosableInputStream
as argument.
in
- To read from, Stream will be closed finally.
IOException
- If that fails.writeToStream(DataTable, OutputStream, ExecutionMonitor)
static ContainerTable readFromZip(ReferencedFile zipFileRef, DataContainer.BufferCreator creator) throws IOException
zipFileRef
- To read from.creator
- Factory object to create a buffer instance.
IOException
- If that fails.readFromZip(File)
protected static ContainerTable readFromZipDelayed(ReferencedFile zipFile, DataTableSpec spec, int bufferID, Map<Integer,ContainerTable> bufferRep)
BufferedDataContainer
to read
the tables from the workspace location.
zipFile
- To read from (is going to be copied to temp on access)spec
- The DTS for the table.bufferID
- The buffer's id used for blob (de)serializationbufferRep
- Repository of buffers for blob (de)serialization.
zipFile
.static ContainerTable readFromZipDelayed(CopyOnAccessTask c, DataTableSpec spec)
BufferedDataContainer
to read the
tables from the workspace location.
c
- The factory that create the Buffer instance that the
returned table reads from.spec
- The DTS for the table.
zipFile
.public static final File createTempFile() throws IOException
IOException
- If that fails for any reason.public static final boolean isContainerTable(DataTable table)
true
if the given argument table has been created
by the DataContainer, false
otherwise.
table
- The table to check.
NullPointerException
- If the argument is null
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |