org.knime.base.node.preproc.sample
Class Sampler

java.lang.Object
  extended by org.knime.base.node.preproc.sample.Sampler

public final class Sampler
extends Object

Utility class that allows to create row filters for sampling.

Author:
Bernd Wiswedel, University of Konstanz

Method Summary
static RowFilter createRangeFilter(DataTable table, double fraction, ExecutionMonitor exec)
          Creates a filter that to filter the first 100 * fraction rows from a table.
static RowFilter createRangeFilter(int count)
          Creates a filter that passes only the first count rows.
static RowFilter createSampleFilter(DataTable table, double fraction, ExecutionMonitor exec)
          Creates row filter that samples precisely a given fraction of rows.
static RowFilter createSampleFilter(DataTable table, double fraction, Random rand, ExecutionMonitor exec)
          Creates row filter that samples precisely a given fraction of rows.
static RowFilter createSampleFilter(DataTable table, int count, ExecutionMonitor exec)
          Creates row filter that samples arbitrary count rows from table.
static RowFilter createSampleFilter(DataTable table, int count, Random rand, ExecutionMonitor exec)
          Creates row filter that samples arbitrary count rows from table.
static RowFilter createSampleFilter(double fraction)
          Creates row filter that randomly samples about 100 * fraction percent from a table.
static DataTable createSamplingTable(DataTable table, RowFilter filter)
          Convenience method that creates a new DataTable that samples rows according to a given row filter.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

createSamplingTable

public static final DataTable createSamplingTable(DataTable table,
                                                  RowFilter filter)
Convenience method that creates a new DataTable that samples rows according to a given row filter.

Parameters:
table - the table to wrap, i.e. to sample from
filter - the filter to use
Returns:
a new RowFilterTable
See Also:
RowFilterTable.RowFilterTable(DataTable, RowFilter)

createRangeFilter

public static final RowFilter createRangeFilter(DataTable table,
                                                double fraction,
                                                ExecutionMonitor exec)
                                         throws CanceledExecutionException
Creates a filter that to filter the first 100 * fraction rows from a table. The row counter is determined based on the row number of table.

Parameters:
table - the table from which to get the final row count
fraction - the fraction of the row count that shall survive
exec - an execution monitor to check for cancelation
Returns:
a row filter for this purpose
Throws:
CanceledExecutionException - if exec cancels the row counting

createRangeFilter

public static final RowFilter createRangeFilter(int count)
Creates a filter that passes only the first count rows.

Parameters:
count - the number of rows that survive (starting from top)
Returns:
a filter that only filter the first count rows

createSampleFilter

public static final RowFilter createSampleFilter(double fraction)
Creates row filter that randomly samples about 100 * fraction percent from a table.

Parameters:
fraction - the fraction being used, must be in [0, 1]
Returns:
such a filter
See Also:
RandomFractionRowFilter.RandomFractionRowFilter(double)

createSampleFilter

public static final RowFilter createSampleFilter(DataTable table,
                                                 double fraction,
                                                 ExecutionMonitor exec)
                                          throws CanceledExecutionException
Creates row filter that samples precisely a given fraction of rows. This requires a scan on the table in order to count the rows and determine the right number of sampled rows.

Parameters:
table - to count rows on
fraction - the fraction to be sampled, must be in [0, 1]
exec - to check canceled status on and report progress
Returns:
such a filter
Throws:
CanceledExecutionException - if canceled

createSampleFilter

public static final RowFilter createSampleFilter(DataTable table,
                                                 double fraction,
                                                 Random rand,
                                                 ExecutionMonitor exec)
                                          throws CanceledExecutionException
Creates row filter that samples precisely a given fraction of rows. This requires a scan on the table in order to count the rows and determine the right number of sampled rows. A given Random object makes the sampling "deterministic".

Parameters:
table - to count rows on
fraction - the fraction to be sampled, must be in [0, 1]
rand - the random object for controlled sampling. (If null, uses default)
exec - to check canceled status on and report progress.
Returns:
such a filter
Throws:
CanceledExecutionException - if canceled

createSampleFilter

public static final RowFilter createSampleFilter(DataTable table,
                                                 int count,
                                                 ExecutionMonitor exec)
                                          throws CanceledExecutionException
Creates row filter that samples arbitrary count rows from table.

Parameters:
table - the table from which to create the sample
count - the number of rows the should go "through" the filter
exec - an execution monitor to check for cancelation (this method requires an iteration over table - which might take long)
Returns:
a row ilter to be used for this kind of sampling.
Throws:
CanceledExecutionException - if exec was canceled

createSampleFilter

public static final RowFilter createSampleFilter(DataTable table,
                                                 int count,
                                                 Random rand,
                                                 ExecutionMonitor exec)
                                          throws CanceledExecutionException
Creates row filter that samples arbitrary count rows from table. A given Random object makes the sampling "deterministic".

Parameters:
table - the table from which to create the sample
count - the number of rows the should go "through" the filter
rand - the random object for controlled sampling. (If null, uses default)
exec - an execution monitor to check for cancelation (this method requires an iteration over table - which might take long)
Returns:
a row filter to be used for this kind of sampling
Throws:
CanceledExecutionException - if exec was canceled


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.