org.knime.core.data.vector.bitvector
Class SparseBitVector

java.lang.Object
  extended by org.knime.core.data.vector.bitvector.SparseBitVector

public class SparseBitVector
extends Object

Stores Zeros and Ones in a vector, i.e. with fixed positions. The vector has a fixed length.
Implementation assumes that the vector is only sparsely populated with '1's. It stores the indices of the ones. For densely populated vectors DenseBitVector is more suitable.
The length of the vector is restricted to Long.MAX_VALUE (i.e. 9223372036854775807). The number of ones that can be stored is limited to Integer.MAX_VALUE (which is 2147483647), in which case it uses about 16Gbyte of memory.
The implementation is not thread-safe.

Author:
ohl, University of Konstanz

Constructor Summary
SparseBitVector(long length)
          Creates a new vector with (initially) space for 64 ones and of the specified length.
SparseBitVector(long length, int initialCapacity)
          Creates a new vector of the specified length and with (initially) space for the specified number of ones.
SparseBitVector(long length, long[] oneIndices)
          Creates a new instance by taking over the initialization from the passed array.
SparseBitVector(SparseBitVector clone)
          Creates a new instance as copy of the passed argument.
SparseBitVector(String hexString)
          Initializes the created bit vector from the hex representation in the passed string.
 
Method Summary
 SparseBitVector and(SparseBitVector bv)
          Creates and returns a new bit vector whose bits are set at positions where both, this and the argument vector have their bits set.
 int cardinality()
          Number of bits set in this bit vector.
 void clear(long bitIdx)
          Sets the bit at the specified index to one.
 SparseBitVector concatenate(SparseBitVector bv)
          Creates and returns a new bit vector that contains copies of both (this and the argument vector).
 boolean equals(Object obj)
          
 boolean get(long bitIdx)
          Returns true if the bit at the specified index is set.
 long[] getAllOneIndices()
          Returns a copy of the internal storage of all bit indices.
 int hashCode()
          
 boolean intersects(SparseBitVector bv)
          Returns true, if this and the argument vector have at least one bit set at the same position.
 boolean isEmpty()
          Returns true if no bits are set in this bit vector.
 long length()
          Returns the number of bits stored in this vector.
 long nextClearBit(long startIdx)
          Finds the next bit not set (that is '0') on or after the specified index.
 long nextSetBit(long startIdx)
          Finds the next bit set to one on or after the specified index.
 SparseBitVector or(SparseBitVector bv)
          Creates and returns a new bit vector whose bits are set at positions where at least one of the vectors (this or the argument vector) have a bit set.
 void set(long bitIdx)
          Sets the bit at the specified index to zero.
 void set(long bitIdx, boolean value)
          Sets the bit at the specified index to the new value.
 void shrink()
          Frees unused memory in the vector.
 SparseBitVector subSequence(long startIdx, long endIdx)
          Creates and returns a new bit vector that contains a subsequence of this vector, beginning with the bit at index startIdx and with its last bit being this' bit at position endIdx - 1.
 String toBinaryString()
          Returns the binary string representation of the bits in this vector.
 String toHexString()
          Returns the hex representation of the bits in this vector.
 String toString()
          Returns a string containing (comma separated) indices of the bits set in this vector.
 SparseBitVector xor(SparseBitVector bv)
          Creates and returns a new bit vector whose bits are set at positions where (exactly) one of the vectors (this or the argument vector) have a bit set.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SparseBitVector

public SparseBitVector(long length)
Creates a new vector with (initially) space for 64 ones and of the specified length.

Parameters:
length - the length of the vector to create

SparseBitVector

public SparseBitVector(long length,
                       int initialCapacity)
Creates a new vector of the specified length and with (initially) space for the specified number of ones.

Parameters:
length - the length of the vector to create
initialCapacity - space will be allocated to store that many ones

SparseBitVector

public SparseBitVector(long length,
                       long[] oneIndices)
Creates a new instance by taking over the initialization from the passed array. The numbers in the array are considered indices of the bits set to one in the vector. The array must be sorted! The lowest bit index must be stored at array index zero. The array must be build like the one returned by the getAllOneIndices() method.

Parameters:
oneIndices - the array containing the indices of the ones. MUST be sorted (lowest index first).
length - the length of the vector. Indices must be smaller than this number.
Throws:
IllegalArgumentException - if length is negative or if the array contains negative numbers or numbers larger than length - or if the array is not sorted!

SparseBitVector

public SparseBitVector(SparseBitVector clone)
Creates a new instance as copy of the passed argument.

Parameters:
clone - the vector to copy into the new instance

SparseBitVector

public SparseBitVector(String hexString)
Initializes the created bit vector from the hex representation in the passed string. Only characters '0' - '9', 'A' - 'F' and 'a' - 'f' are allowed. The character at string position (length - 1) represents the bits with index 0 to 3 in the vector. The character at position 0 represents the bits with the highest indices. The length of the vector created is the length of the string times 4 (as each character represents four bits).

Parameters:
hexString - containing the hex value to initialize the vector with
Throws:
IllegalArgumentException - if hexString contains characters other then the hex characters (i.e. 0 - 9, A - F, and a - f)
Method Detail

length

public long length()
Returns the number of bits stored in this vector.

Returns:
the length of the vector.

shrink

public void shrink()
Frees unused memory in the vector. If a vector loses a lot of ones the used storage could be reduced (as only the indices of ones are stores).


set

public void set(long bitIdx,
                boolean value)
Sets the bit at the specified index to the new value.

Parameters:
bitIdx - the index of the bit to set or clear
value - if true, the specified bit will be set, otherwise it will be cleared.
Throws:
ArrayIndexOutOfBoundsException - if the index is negative or larger than the size of the vector

set

public void set(long bitIdx)
Sets the bit at the specified index to zero.

Parameters:
bitIdx - the index of the bit to clear.
Throws:
ArrayIndexOutOfBoundsException - if the index is negative or larger than the size of the vector

clear

public void clear(long bitIdx)
Sets the bit at the specified index to one.

Parameters:
bitIdx - the index of the bit to set.
Throws:
ArrayIndexOutOfBoundsException - if the index is negative or larger than the size of the vector

cardinality

public int cardinality()
Number of bits set in this bit vector.

Returns:
the number of ones in this vector

isEmpty

public boolean isEmpty()
Returns true if no bits are set in this bit vector.

Returns:
true if no bits are set in this bit vector.

intersects

public boolean intersects(SparseBitVector bv)
Returns true, if this and the argument vector have at least one bit set at the same position.

Parameters:
bv - the vector to test
Returns:
true, if this and the argument vector have at least one bit set at the same position.

get

public boolean get(long bitIdx)
Returns true if the bit at the specified index is set. False otherwise.

Parameters:
bitIdx - the index of the bit to test.
Returns:
true if the specified bit is set, false otherwise
Throws:
ArrayIndexOutOfBoundsException - if the index is larger than the length of the vector

nextSetBit

public long nextSetBit(long startIdx)
Finds the next bit set to one on or after the specified index. Returns an index larger than or equal the provided index, or -1 if no bit is set after the startIdx. (This is the only method (and the #nextClearBit) where it is okay to pass an index larger than the length of the vector.)

Parameters:
startIdx - the first index to look for '1's. (It is allowed to pass an index larger then the vector's length.)
Returns:
the index of the next bit set to one, which is on or after the provided startIdx.
Throws:
ArrayIndexOutOfBoundsException - if the specified startIdx is negative

nextClearBit

public long nextClearBit(long startIdx)
Finds the next bit not set (that is '0') on or after the specified index. Returns an index larger than or equal the provided index, or -1 if no bit is cleared after the startIdx. (This is the only method (and the #nextSetBit) where it is okay to pass an index larger than the length of the vector.)

Parameters:
startIdx - the first index to look for '0's.
Returns:
the index of the next cleared bit, which is on or after the provided startIdx. Or -1 if the vector contains no zero anymore.
Throws:
ArrayIndexOutOfBoundsException - if the specified startIdx negative

subSequence

public SparseBitVector subSequence(long startIdx,
                                   long endIdx)
Creates and returns a new bit vector that contains a subsequence of this vector, beginning with the bit at index startIdx and with its last bit being this' bit at position endIdx - 1. The length of the result vector is endIdx - startIdx. If startIdx equals endIdx a vector of length zero is returned.

Parameters:
startIdx - the startIdx of the subsequence
endIdx - the first bit in this vector after startIdx that is not included in the result sequence.
Returns:
a new vector of length endIdx - startIdx containing the subsequence of this vector from startIdx (included) to endIdx (not included anymore).

and

public SparseBitVector and(SparseBitVector bv)
Creates and returns a new bit vector whose bits are set at positions where both, this and the argument vector have their bits set. The length of the new vector is the maximum of the length of this and the argument.

Parameters:
bv - the vector to AND this one with
Returns:
a new instance containing the result of the AND operation

or

public SparseBitVector or(SparseBitVector bv)
Creates and returns a new bit vector whose bits are set at positions where at least one of the vectors (this or the argument vector) have a bit set. The length of the new vector is the maximum of the length of this and the argument.

Parameters:
bv - the vector to OR this one with
Returns:
a new instance containing the result of the OR operation

xor

public SparseBitVector xor(SparseBitVector bv)
Creates and returns a new bit vector whose bits are set at positions where (exactly) one of the vectors (this or the argument vector) have a bit set. The length of the new vector is the maximum of the length of this and the argument.

Parameters:
bv - the vector to XOR this one with
Returns:
a new instance containing the result of the XOR operation

concatenate

public SparseBitVector concatenate(SparseBitVector bv)
Creates and returns a new bit vector that contains copies of both (this and the argument vector). The argument vector is appended at the end of this vector, i.e. its bit with index zero will be stored at index "length-of-this-vector" in the result vector. The length of the result is the length of this plus the length of the argument vector.

Parameters:
bv - the vector to append at the end of this
Returns:
a new instance containing both vectors concatenated

hashCode

public int hashCode()

Overrides:
hashCode in class Object

equals

public boolean equals(Object obj)

Overrides:
equals in class Object

toString

public String toString()
Returns a string containing (comma separated) indices of the bits set in this vector. The number of bit indices added to the string is limited to 30000. If the output is truncated, the string ends on "... }"

Overrides:
toString in class Object
Returns:
a string containing (comma separated) indices of the bits set in this vector.

toHexString

public String toHexString()
Returns the hex representation of the bits in this vector. Each character in the result represents 4 bits (with the characters '0' - '9' and 'A' - 'F'). The character at string position (length - 1) holds the lowest bits (bit 0 to 3), the character at position 0 represents the bits with the largest index in the vector. If the length of the vector is larger than (Integer.MAX_VALUE - 1) * 4 (i.e. 8589934584), the result is truncated (and ends with ...).

Returns:
the hex representation of this bit vector.

toBinaryString

public String toBinaryString()
Returns the binary string representation of the bits in this vector. Each character in the result represents one bit - a '1' stands for a set bit, a '0' represents a cleared bit. The character at string position (length - 1) holds the bit with index 0, the character at position 0 represents the bits with the largest index in the vector. If the length of the vector is larger than (Integer.MAX_VALUE - 3) (i.e. 2147483644), the result is truncated (and ends with ...).

Returns:
the binary (0/1) representation of this bit vector.

getAllOneIndices

public long[] getAllOneIndices()
Returns a copy of the internal storage of all bit indices. The array contains the sorted indices of all '1's in the vector. The length of the returned array is the cardinality of the vector.

Returns:
a copy of the internal representation of the bits in this vector.


Copyright, 2003 - 2010. All rights reserved.
University of Konstanz, Germany.
Chair for Bioinformatics and Information Mining, Prof. Dr. Michael R. Berthold.
You may not modify, publish, transmit, transfer or sell, reproduce, create derivative works from, distribute, perform, display, or in any way exploit any of the content, in whole or in part, except as otherwise expressly permitted in writing by the copyright owner or as specified in the license file distributed with this product.