blob: fa651aecb14a07af9888620e21e35547e50f1766 [file] [log] [blame]
================================================================================
Structure of the sparse type support
================================================================================
There are two parts to the sparse types:
1) an in-memory data structure and computational library for sparse data called
SparseData.
2) a Cloudberry/Postgres datatype for each case, initially a sparse vector which
uses only (double) as it's data
Eventually there will be sparse vector support for other base types like float,
complex double, integer, char and bit. There will also be matrix support,
which will use the SparseData structure to compose higher dimension structures.
=========================
Design
=========================
The Sparse Vector type is a variable length Postgres datatype, which means that
its data is stored in a "varlena". A varlena is a structure that has an integer
that identifies the length of it's data area as it's first item.
For the Sparse Vector, the varlena has a leading integer that describes the size
or "dimension" of the vector it represents, followed by a serialized SparseData
of type FLOAT8OID.
All of the operators required for the datatype support are wrappers that use the
SparseData operators. In order to use a SparseData operator, the serialized
SparseData inside the Sparse Vector is used to create an in-place accessible
SparseData (no copy required), and then this is passed to the SparseData
functions.
If you want to implement a new operator or function for Sparse Vector, you should
try to implement it inside of operators.c using existing SparseData functions or
building on other Sparse Vector functions first. If something isn't there, you
can build a base level function inside of the SparseData.h as an inline function
or inside SparseData.c as a non-inline function, then wrap it in a Sparse Vector
function in operators.c.
The sparse_vector.c file is for creation and management functions related to the
type.
=========================
Code structure
=========================
SparseData structure and functions:
------------------------------------------------------------------------
SparseData.c, SparseData.h
Sparse vector datatype:
------------------------------------------------------------------------
sparse_vector.c, sparse_vector.h, operators.c
Functions for text processing (Sparse Feature Vector or SFV):
------------------------------------------------------------------------
gp_sfv.c