org.knime.core.data.vector.bytevector
Class SparseByteVector

java.lang.Object
  extended by org.knime.core.data.vector.bytevector.SparseByteVector

public class SparseByteVector
extends Object

A vector of fixed length holding byte counts at specific positions. Only positive values of counts are supported. Each index can store a number between 0 and 255 (both inclusive). Attempts to store negative numbers or numbers larger than 255 cause an exception. This implementation stores only the counts not equal to zero, thus it is suitable for large and sparsely populated vectors.
The maximum length is Long.MAX_VALUE(i.e. 9223372036854775807). The maximum number of counts larger than zero that can be stored is Integer.MAX_VALUE (i.e. 2147483647).
The implementation is not thread-safe.

Author:
ohl, University of Konstanz

Constructor Summary
SparseByteVector(long length)
          Creates a new vector with (initially) space for 64 counts and of the specified length.
SparseByteVector(long length, int initialCapacity)
          Creates a new vector of the specified length and with (initially) space for the specified number of counts.
SparseByteVector(long length, long[] countIndices, byte[] counts)
          Creates a new instance by taking over the initialization from the passed arrays.
SparseByteVector(SparseByteVector byteVector)
          Creates a clone of the passed vector.
 
Method Summary
 SparseByteVector add(SparseByteVector bv, boolean remainder)
          Returns a new vector with the sum of the counts at each position.
 int cardinality()
          Returns the number of counts larger than zero stored in this vector.
 void clear(long index)
          Resets the count at the specified index (sets it to zero).
 SparseByteVector concatenate(SparseByteVector bv)
          Creates and returns a new byte vector that contains copies of both (this and the argument vector).
 boolean equals(Object obj)
          
 int get(long index)
          Returns the number stored at the specified index.
 long[] getAllCountIndices()
          Returns a copy of the internal storage of all values.
 byte[] getAllCounts()
          Returns a copy of the internal storage of all values.
 int hashCode()
          
 boolean isEmpty()
          Checks all counts and returns true if they are all zero.
 long length()
          Returns the number of numbers stored in this vector.
 SparseByteVector max(SparseByteVector bv)
          Returns a new vector with the maximum of the counts at each position.
 SparseByteVector min(SparseByteVector bv)
          Returns a new vector with the minimum of the counts at each position.
 long nextCountIndex(long startIdx)
          Finds the next count not equal to zero on or after the specified index.
 long nextZeroIndex(long startIdx)
          Finds the next index whose value is zero on or after the specified index.
 void set(long index, int value)
          Stores the number at the specified index.
 void shrink()
          Frees unused memory in the vector.
 SparseByteVector subSequence(long startIdx, long endIdx)
          Creates and returns a new byte vector that contains a subsequence of this vector, beginning with the byte at index startIdx and with its last byte being this' byte at position endIdx - 1.
 long sumOfAllCounts()
          Calculates the checksum, the sum of all counts stored.
 String toString()
          Returns a string containing (comma separated) all numbers stored in this vector.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SparseByteVector

public SparseByteVector(long length)
Creates a new vector with (initially) space for 64 counts and of the specified length.

Parameters:
length - the length of the vector to create

SparseByteVector

public SparseByteVector(long length,
                        int initialCapacity)
Creates a new vector of the specified length and with (initially) space for the specified number of counts.

Parameters:
length - the length of the vector to create
initialCapacity - space will be allocated to store that many numbers

SparseByteVector

public SparseByteVector(long length,
                        long[] countIndices,
                        byte[] counts)
Creates a new instance by taking over the initialization from the passed arrays. The numbers in the first argument array (countIndices) are considered indices of the positions a number is stored at. The second array (counts) contains the corresponding number to store. Both arrays must have the same length.
The countIndices array must be sorted! The lowest index must be stored at array index zero. The arrays must be build like the one returned by the getAllCountIndices() and getAllCounts() methods.

Parameters:
length - the length of the vector. Indices must be smaller than this number.
countIndices - the array containing the indices of the counts to store. MUST be sorted (lowest index first).
counts - the numbers to store. Note, even though Java handles byte as signed numbers, the passed counts are interpreted as positive counts in the range of 0 ... 255.
Throws:
IllegalArgumentException - if length is negative or if the array contains negative indices or indices larger than length - or if the array is not sorted or the arrays do not have the same length!

SparseByteVector

public SparseByteVector(SparseByteVector byteVector)
Creates a clone of the passed vector.

Parameters:
byteVector - the vector to clone.
Method Detail

length

public long length()
Returns the number of numbers stored in this vector.

Returns:
the length of the vector.

shrink

public void shrink()
Frees unused memory in the vector. If a vector loses a lot of ones the used storage could be reduced (as only the indices of ones are stores).


set

public void set(long index,
                int value)
Stores the number at the specified index.

Parameters:
index - the index of the position where the count will be stored.
value - the number to store at the specified index. Must be in the range of 0 ... 255.
Throws:
ArrayIndexOutOfBoundsException - if the index is negative or larger than the size of the vector.
IllegalArgumentException - if the specified value is negative or larger than 255.

clear

public void clear(long index)
Resets the count at the specified index (sets it to zero).

Parameters:
index - the index of the position to clear.
Throws:
ArrayIndexOutOfBoundsException - if the index is negative or larger than the size of the vector

cardinality

public int cardinality()
Returns the number of counts larger than zero stored in this vector.

Returns:
the number of elements not equal to zero in this vector.

isEmpty

public boolean isEmpty()
Checks all counts and returns true if they are all zero.

Returns:
true if all counts are zero.

get

public int get(long index)
Returns the number stored at the specified index.

Parameters:
index - the index of the number to return.
Returns:
the number (in the range of 0 ... 255) stored at the specified index.
Throws:
ArrayIndexOutOfBoundsException - if the index is larger than the length of the vector

nextCountIndex

public long nextCountIndex(long startIdx)
Finds the next count not equal to zero on or after the specified index. Returns an index larger than or equal the provided index, or -1 if no count larger than zero exists after the startIdx. (This is the only method (and the #nextZeroIndex) where it is okay to pass an index larger than the length of the vector.)

Parameters:
startIdx - the first index to look for non-zero counts. (It is allowed to pass an index larger then the vector's length.)
Returns:
the index of the next count larger than zero, which is on or after the provided startIdx, or -1 if there isn't any
Throws:
ArrayIndexOutOfBoundsException - if the specified startIdx is negative

nextZeroIndex

public long nextZeroIndex(long startIdx)
Finds the next index whose value is zero on or after the specified index. Returns an index larger than or equal the provided index, or -1 if no such index exists. (This is the only method (and the #nextCountIndex) where it is okay to pass an index larger than the length of the vector.)

Parameters:
startIdx - the first index to look for zero values.
Returns:
the index of the next index with value zero, which is on or after the provided startIdx. Or -1 if the vector contains no zeros there after.
Throws:
ArrayIndexOutOfBoundsException - if the specified startIdx negative

sumOfAllCounts

public long sumOfAllCounts()
Calculates the checksum, the sum of all counts stored.

Returns:
the sum of all counts in this vector.

add

public SparseByteVector add(SparseByteVector bv,
                            boolean remainder)
Returns a new vector with the sum of the counts at each position. The result's length is the maximum of this' and the argument's length. The value at position i in the result is the sum of the counts in this' and the arguments vector at position i.

Parameters:
bv - the vector to add to this one (position-wise).
remainder - if true and the result of the addition is larger than 255, the value in the result vector will be set to the remainder when divided by 255 - if false, the result vector is set to 255 if the sum is larger than 255. (Setting it to true performs slightly better.)
Returns:
a new instance holding at each position the sum of the counts.

min

public SparseByteVector min(SparseByteVector bv)
Returns a new vector with the minimum of the counts at each position. The result's length is the maximum of this' and the argument's length. The value at position i in the result is the minimum of the counts in this' and the arguments vector at position i.

Parameters:
bv - the vector to compute the minimum of (position-wise).
Returns:
a new instance holding at each position the minimum of the counts.

max

public SparseByteVector max(SparseByteVector bv)
Returns a new vector with the maximum of the counts at each position. The result's length is the maximum of this' and the argument's length. The value at position i in the result is the maximum of the counts in this' and the arguments vector at position i.

Parameters:
bv - the vector to compute the maximum of (position-wise).
Returns:
a new instance holding at each position the maximum of the counts.

concatenate

public SparseByteVector concatenate(SparseByteVector bv)
Creates and returns a new byte vector that contains copies of both (this and the argument vector). The argument vector is appended at the end of this vector, i.e. its value with index zero will be stored at index "length-of-this-vector" in the result vector. The length of the result is the length of this plus the length of the argument vector.

Parameters:
bv - the vector to append at the end of this
Returns:
a new instance containing both vectors concatenated

subSequence

public SparseByteVector subSequence(long startIdx,
                                    long endIdx)
Creates and returns a new byte vector that contains a subsequence of this vector, beginning with the byte at index startIdx and with its last byte being this' byte at position endIdx - 1. The length of the result vector is endIdx - startIdx. If startIdx equals endIdx a vector of length zero is returned.

Parameters:
startIdx - the first index included in the subsequence
endIdx - the first byte in this vector after startIdx that is not included in the result sequence.
Returns:
a new vector of length endIdx - startIdx containing the subsequence of this vector from startIdx (included) to endIdx (not included anymore).

hashCode

public int hashCode()

Overrides:
hashCode in class Object

equals

public boolean equals(Object obj)

Overrides:
equals in class Object

toString

public String toString()
Returns a string containing (comma separated) all numbers stored in this vector. The number of values added to the string is limited to 30000. If the output is truncated, the string ends on "... }"

Overrides:
toString in class Object
Returns:
a string containing (comma separated) the values in this vector.

getAllCounts

public byte[] getAllCounts()
Returns a copy of the internal storage of all values. The array contains only the values larger than zero. The position of these values in the vector can be retrieved from the array returned by getAllCountIndices(). The arrays returned by these two methods are of same length. The count at index i in the result array is located in the vector at the index stored in the other array at the same index i. Note, even though Java stores signed numbers in byte, the returned number are values in the range of 0... 255.
The length of the returned array is the cardinality of the vector.

Returns:
a copy of the internal representation of the bits in this vector.

getAllCountIndices

public long[] getAllCountIndices()
Returns a copy of the internal storage of all values. The array contains the sorted indices of all '1's in the vector. The length of the returned array is the cardinality of the vector.

Returns:
a copy of the internal representation of the bits in this vector.


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.