blob: cdfa5082cb28d4f09c94f8d46cacd7f88ca87728 [file] [log] [blame]
Revision: 02/03/10
Im spending most of my part time on working out the actual mechanics (with
examples) of the matrix/vector implementation.
What will help is to get some input on likely use-cases that can drive the work
though Ive got a good number of examples from scientific and econometric
modeling, I could always benefit from something concrete and immediate.
Following is more on mechanics.
Another finding on indexes, RLE compression and matrices:
input matrices come in various flavors, many of them in non-sorted not ready
for sparse compression formats
in many circumstances there is no a-priori choice of whether there are zeros or
NVP
Because of (1), in the input processing and sorting, it is necessary to create
an uncompressed index that is identical to the single RLE index Ive described
previously, sans the special values.
Ive described the input processing here in a program I wrote to process
matrices from the Matrix Mart at NIST:
/* The Matrix Market format does not specify the order in which the
* successive elements are stored. We need to compress the matrix into
* one of CRS or CCS format (row or column compression), so we need to
* sort the matrix into one of those forms.
*
* Here we'll use quicksort to sort the matrix elements into row
* contiguous storage.
*
* If we desire row contiguous storage, we'll also need the column order
* within the row preserved. We'll do this by computing the index into
* the matrix using a "columns progressive" ordering, which is computed
* as NJ * I + J. Then we'll sort based on that index. The index value
* will range from 0 to NJ*NI-1. For any index value, you can compute I
* and J using:
* I = index/NJ
* J = index - NJ * I
*
* If we sort the matrix by this index, we'll have things in row/column
* order, suitable for CRS compression.
*
* Note that if we want CCS, we can construct the index using:
* index = NI * J + I
*/
In the Yale/MKL formats, we deconstruct the single index into a set of two
values for each run and that is stored as the compression index.
Im thinking now that for the RLE index we can do something along these lines:
Treat all values symmetrically, that is: all runs of the various types were
discussing (inf,-inf,0,NVP,value) are described by the same kind of word
Each RLE word is composed of a leading set of descriptor bits, followed by a
payload describing how many of the value are found.
So if we pick the descriptor bits = 3, we can represent 8 things”, of which 5
things must be what kind of value is this?”, leaving 3 things. A good use of
the 3 things can be word length, one of 8, 16 or 32.
So if we go down this path, were left with 5, 13 and 29 bits to represent a run
and can dynamically choose which one to use based on the data content.
Thats 32, 8192 and 536,870,912 values in each run, which is perfect for
capturing various kinds of sparsity.
If we expand to one more descriptor bit, then we can represent 8 more things,
which could be used to recognize additional types of values to compress out.
That would leave us with 16, 4096 and 268,435,456 values in each run, which also
works pretty well and leaves us room for the future in case we find there are
more values than {inf,-inf,0,NVP,value}.
Some additional results to share (see table below) I did a computation of the
various storage and compute requirements for the MKL/Yale index and the RLE
index. I used 5 matrix examples from NISTs Matrix Market”, read them in and
translated them from Matrix Market format to MKL/RLE which required sorting them
into CRS. From this, I calculated the relevant stats on each in terms of
storage and number of words in each of the RLE and MKL indices.
Bottom line is that the results corroborated my earlier calculations. The RLE
index is a lot smaller than MKL always, and is more computationally efficient
for linear scans through the compressed matrix. This means we should use only
RLE for most use-cases and calculate MKL only when we need to use the
algorithms in the matrix solver libraries. We can optionally store the MKL
index if we calculate it we can provide a marker to indicate the
presence/absence of it in the type.
Ive also thought through the compatibility with all the sparse solver libs
seems like we have easy compatibility with a CRS-based approach, so I think that
should be the baseline. We can provide another marker to indicate what is being
stored CRS, CCS, CDS, etc.
I believe I have enough information to produce a draft design of the matrix type
that we can review and get some other feedback about. Here is a sketch of how I
think it should work:
Two separate types, one for vectors and one for matrices but they share the
same format and support routines. The separate types allows definition of
vector and matrix specific operators.
The format stores the following metadata about the datum:
nD, (n(i),i=1,nD): dimension and rank for a vector, this is {1,N}; for a
matrix its {2,Nrows,Ncolumns}
nZ: number of values (non-special values)
valType: type of value, one of {bit, byte, int4, int8, real4, complex4,
real8, complex8}
storType: type of storage, one of {CRS, CCS, CDS, ... Up to maybe 32 total
with some reserved}
specValList: a list of special values to be used in the compression, up to
8, e.g. {inf,-inf,NVP,0}
hasMKL: if true, the MKL index is stored
The format stores the following data:
values: a dense array of values, size is (nZ*sizeof(valType))
rleIndex: an RLE index containing location of all values (including specials)
sizeofRleIndex: the size of the RLE index for allocation purposes
if (hasMKL) mklIndex: the MKL/Yale index, size is ((nZ+NI)*sizeof(int))
01234567890123456789012345678901234567890123456789012345678901234567890123456789
In order to efficiently compress the storage of multiple data types, we are
using a bitmap that describes the positions of four special 64-bit floating
point values: zero, negative infinity, positive infinity, and "no value present"
or NVP. Of these, the first three are all standard floating point values that
can be the natural result of performing calculations. NVP is a special case
often used in statistics and other analytical processing but is not
representable as a standard floating point number.
Here we're using two patterns for sparse storage and computation:
1) recognition of compressible types and subsequent bitmap compression
2) optimized computational usage of previously compressed values
(1) recognizes that we're going to have many situations where special values
arise as a natural consequence of computation. In these cases, we will
automatically locate and compress special values into a special bitmap storage
format.
(2) leverages the pre-existing compressed special value storage for optimized
computation. For example, if we have a vector of values V and a large fraction
of the values in V are zero, then computing "W = 1/V" will produce a
correspondingly large number of positive infinity values in W. With our
compressed bitmap storage of zeros, we can loop over the compressed zero
elements and change their storage type from "zero" to "infinity", thereby
saving the computation, memory and re-compression overheads.
It's important to have a bitmap format that is both storage efficient and
optimal for computation. There are many existing formats for sparse vector and
matrix storage, most of them developed for scientific computation. The Intel
Math Kernel Library (MKL) specifies the most common sparse matrix storage
format, derived from the Yale Sparse Matrix format:
"The Intel MKL direct sparse solvers use a row-major upper triangular storage
format: the matrix is compressed row-by-row and for symmetric matrices only non-
zero elements in the upper triangular half of the matrix are stored."
The Intel MKL sparse matrix storage format for direct sparse solvers is
specified by three arrays: values, columns, and rowIndex. The following table
describes the arrays in terms of the values, row, and column positions of the
non-zero elements in a sparse matrix.
values - A real or complex array that contains the non-zero elements
of a sparse matrix. The non-zero elements are mapped into the
values array using the row-major upper triangular storage
mapping described above.
columns - Element I of the integer array columns is the number of the
column that contains the I-th element in the values array.
rowIndex - Element j of the integer array rowIndex gives the index of
the element in the values array that is first non-zero
element in a row j.
The length of the values and columns arrays is equal to the number of non-zero
elements in the matrix."
- Intel Math Kernel Library Reference Manual, March 2009, Document Number:
630813-031US
The MKL format stores the non-zero values compactly in a sequential array of
either complex or double precision entries and maintains two index arrays that
identify the location of the nonzero values in a two dimensional matrix. The
advantages of the MKL format are it's efficiency for computation and the fact
it's in widespread usage in numerical libraries.
The uncompressed size of a real matrix with NI rows and NJ columns would be:
Raw size = (8 bytes) * ( NI*NJ )
The storage required by the MKL format for a real matrix with N nonzero values
and NI number of rows is:
Comp Size = (8 bytes) * ( 2N + NI )
For a complex matrix it's:
Comp Size = (8 bytes) * ( 3N + NI )
Another way of representing the storage is to assess the "overhead" associated
with the representation of the zero entries, which is the same for real and
complex:
Overhead = (8 bytes) * ( N + NI )
Now we can calculate the "savings" associated with using the MLK compressed
format for a real matrix:
Savings = (Raw Size) - (Comp Size)
(8 bytes) ( NI*NJ - 2N - NI )
If we have a dense matrix, then N will be NI*NJ and the savings is equal to the
negative overhead, which means we'll have a worst case savings of:
Dense (worst case) savings = -(8 bytes)(NI*NJ + NI)
= -(8 bytes)*NI*(NJ+1)
For a dense matrix, this is a 100%+ storage penalty.
There is one important shortcoming of the MKL format: it can only represent one
type of special value - zero. This is a consequence of the implicit nature of
the approach, the location of the zero values is not stored, it is implied by
the lack of a non-zero value in that matrix position. There is no place to
provide an attribute of the special value so there can be only one type of
special value.
We need an alternative format that can represent different types of special
values. It will need to refer to the special values explicitly in an efficient
manner. We can take advantage of the nature of most all sparse storage
circumstances: the special values occur in relatively large runs compared to
the non-zero values. We can use a "run length encoding" or RLE approach to
compose a compressed special value array, in addition to storing a regular
packed array of the non-special values. The index location of the non-special
values can be inferred from the RLE array of special values during computation.
Let's represent the array of special values using a 16-bit word to represent
each run of special values, and an 8 bit word for each run of non-special
values. Each of the two word types uses the first bit (the sign bit) to
represent the run as consisting of special values (positive) or non-special
values (negative). Each special value word uses the next 2 bits to describe what
kind of special values are in the run, one of zero, inf, -inf and NVP.
A complete array will be represented as a collection of these two types of
words, terminated by a special "end of array" word consisting of 16 bits of
zeros (0x0000). For example:
<special values><non special values><special values><0x0000>
This can be seen in bytes form:
<2 bytes><1 byte><2 bytes><2 bytes>
And if the special values are zeros (type 0), with runs of 8192 in each of the
two runs with a run of 128 non-special values, the bits in this array would be:
<0001111111111111><11111111><0001111111111111><0000000000000000>
Note that if there are typically a large number of zeros followed by only one
non-zero value, then this representation will waste a lot of space in the
non-special value words. We could shorten the representation of non-special
values to say 5 bits, which would allow for runs of 16 to be represented by
each word and we would save three bits per run. However, the alignment of the
index computations would be less than 1 byte, which would likely reduce the
computational efficiency.
If a given run has more than the maximum representable by the word, words are
concatenated as needed to represent the run.
For matrices we can use the compressed row format, or the compressed column
format - for purpose of the efficiency discussion let's assumed compressed rows.
The storage efficiency of this approach is more complicated to compute, as the
storage will depend on the dispersion of values. We can compute best case and
worst case scenarios however.
A reasonable "best case" scenario will have one non-special value per row. In
this case the storage required by the RLE format for a real matrix with N
nonzero values, NI rows and NJ columns is:
Comp Size = (9 bytes) * N + (2 bytes) *(NJ*(NI/8192) +1)
= (9 bytes) *NI + (2 bytes) *(NJ*(NI/8192) +1)
for the common case of NI < 8192:
= (9 bytes) *NI + (2 bytes) *(NJ +1)
In this case, the storage is O(NI+NJ). For instance, if we have a matrix of
size 2,000 by 2,000, the storage is (8 bytes) * 4,000 = 32KB.
For a worst case scenario, we can assume alternating non-special and special
values, in which case the storage is:
Comp Size = (9 bytes) * N + (2 bytes) * (NJ*(NI/2) +1)
= (9 bytes) *NJ*(NI/2) + (2 bytes) * (NJ*(NI/2) +1)
= (11 bytes) *NJ*(NI/2) + (2 bytes)
What's interesting is that this worst case size is actually smaller (by almost
half) than storing the full dense matrix. This is due to the more efficient
representation of the special values in 2 byte format. The overhead is nearly
equivalent to storing the special values in addition to the non-special values.
If we have a dense matrix, there is almost zero overhead because the special
value array is blank and we only store the non-special data values.
If we compare the MKL approach savings to the RLE approach in the best case
scenario, we have an RLE savings of:
RLE
Savings = (Raw Size) - (Comp Size)
= (8 bytes) *NI*NJ - (9 bytes)*NI - (2 bytes)*(NJ +1)
for NI ~ NJ:
~ (8 bytes) *NI*NJ - (11 bytes)*NI
~ (8 bytes) *NI*(NJ-1)
MKL
Savings = (8 bytes) ( NI*NJ - 3NI )
= (8 bytes) *NI*(NJ-3)
The savings in the best case is nearly the same. In the worst case scenario of
a dense matrix, the savings of the RLE approach is nearly 0, which is the best
result possible. For MKL however, the savings in the worst case is more than
a 100% storage penalty.
Most cases are somewhere between best case and worst case. We can conclude
that the storage of the RLE scheme is going to be more efficient than MKL in
all cases.