src/backend/executor/README.tuples - cloudberry - Git at Google

 Tuple Representations
 =====================

 What is a tuple representation?
 -------------------------------

 Tuples are passed around in generic TupleTableSlot objects. Interested parties
 need to be able to decode the data and see the values. The current tuple
 representation provides information on how to decode the data into typed
 attributes

 Why do we need more than one tuple representation?
 ---------------------------------------------------

 Data comes from disk in a disk-oriented representation, serialized in a buffer
 and includes MVCC columns for heap tables.  That is sub-optimal for fast
 processing.  The execution engine does not usually need system columns, so it
 can drop the MVCC info.  Some operators don’t need all the attributes (they
 execute projections), but creating a new tuple means copying all the columns
 into a new buffer.  Variable length attributes make non-sequential accessing of
 attributes slow.  We have different in-memory representation of tuples that can
 improve performance for these cases.


 What different representations do we have?
 ------------------------------------------

  * HeapTuple
    - in a buffer or
    - palloc'ed

  * MinimalTuple
    - same as HeapTuple, but no system columns

  * VirtualTuple
    - optimization to avoid copying data for projections, etc.

  * MemTuple
    - Cloudberry addition: similar to MinimalTuple but "CPU" optimized

 TupleTableSlot
 --------------

 A TupleTableSlot is a very central data structure used to store tuples.  Most
 important fields:

  * TupleDesc: tuple descriptor that contains the schema.

  * HeapTuple: if available, physical representation of the tuple.  This is
    usually just a pointer to a buffer and a length.

  * MemTuple: if available, memtuple representation of the tuple.

  * Datum *values: Array of the values (datums) for all the attributes.  Might
    not be available.

  * bool *is_null: Array of booleans, is a particular attribute null.  Might not
    be available.

  * nvalid: How many of the attributes we have extracted in (values, is_null)
    arrays.

 Tuple Representations Stored in a TupleTableSlot
 ------------------------------------------------

 HeapTuple

 This is identical to the on-disk representation in a heap table.  It's a
 serialized (or "materialized") tuple, stored either in a buffer that belongs to
 the buffer manager, or allocated with palloc onto the process' heap.  Where a
 heap tuple is stored is important for resource management as shared buffers
 need to be correctly pinned.

 MinimalTuple

 It's similar to a palloc'ed HeapTuple, but the system columns are
 stripped.  This means there's no useless MVCC information carried around in the
 executor.

 VirtualTuple

 A "virtual" tuple is an optimization used to minimize physical data copying in
 between plan nodes.  Any pass-by-reference Datums in the tuple point to storage
 that is not directly associated with the TupleTableSlot.  They can point to (1)
 part of a tuple stored in a lower plan node's output TupleTableSlot or (2) a
 function result constructed in a plan node's per-tuple econtext.  It is the
 responsibility of the generating plan node to be sure these resources are not
 released for as long as the virtual tuple needs to be valid.

 Tuples that need to be copied anywhere (e.g. Motion, or DML) must be
 "materialized" into physical tuples.

 MemTuple

 In Cloudberry, MemTuple replaces the MinimalTuple.  MemTuple is also a
 serialized tuple format.  The main design goal is to reduce CPU usage so that
 getting to a certain column does not require decoding of the columns preceding
 it.

 Non-sequential accessing attributes (columns) of HeapTuple and MinimalTuple is
 done using slot_getsomeattrs or slot_getallattrs to populate some or all of the
 Datum* value pointers.  Accessing the M-th attribute involves iterating through
 the first M-1 attributes, calculating their actual length (fixed or variable,
 null or not), the adding to an offset.

 MemTuple is an optimized representation which speeds-up non-sequential access
 of the At query start time, for each slot (or each place we use MemTuple), we
 compute a binding data structure from the schema (TupleDesc).  For each
 attribute in the tuple, the binding structure contains information that
 determines where the attribute is located in the MemTuple.  It consists of the
 following:

  1. The physical attribute number.  Physical attr numbers may differ from
     schema because we reorder attributes as follows.  8 bytes aligned attrs go
     first, then 4 bytes aligned, then 2, then 1, and last variable length
     field.

  2. The offset where the attribute starts and the length of the attributes.

  3. Information to adjust the offset if there are null columns before (in
     physical order) the attribute.

 There are many functions to convert the tuple stored in a TupleTableSlot from
 one representation to another.  Refer to access/memtup.h and
 executor/tuptable.h for more details.

 Toasted attributes
 ------------------

 Values that are too large to fit into a heap tuple in-line are stored
 externally in a toast table.  A large value is broken down into smaller fixed
 sized chunks.  Each chunk is stored as a tuple in the toast table.  More
 details can be found in access/tuptoaster.c.
	Tuple Representations
	=====================

	What is a tuple representation?
	-------------------------------

	Tuples are passed around in generic TupleTableSlot objects. Interested parties
	need to be able to decode the data and see the values. The current tuple
	representation provides information on how to decode the data into typed
	attributes

	Why do we need more than one tuple representation?
	---------------------------------------------------

	Data comes from disk in a disk-oriented representation, serialized in a buffer
	and includes MVCC columns for heap tables. That is sub-optimal for fast
	processing. The execution engine does not usually need system columns, so it
	can drop the MVCC info. Some operators don’t need all the attributes (they
	execute projections), but creating a new tuple means copying all the columns
	into a new buffer. Variable length attributes make non-sequential accessing of
	attributes slow. We have different in-memory representation of tuples that can
	improve performance for these cases.


	What different representations do we have?
	------------------------------------------

	* HeapTuple
	- in a buffer or
	- palloc'ed

	* MinimalTuple
	- same as HeapTuple, but no system columns

	* VirtualTuple
	- optimization to avoid copying data for projections, etc.

	* MemTuple
	- Cloudberry addition: similar to MinimalTuple but "CPU" optimized

	TupleTableSlot
	--------------

	A TupleTableSlot is a very central data structure used to store tuples. Most
	important fields:

	* TupleDesc: tuple descriptor that contains the schema.

	* HeapTuple: if available, physical representation of the tuple. This is
	usually just a pointer to a buffer and a length.

	* MemTuple: if available, memtuple representation of the tuple.

	* Datum *values: Array of the values (datums) for all the attributes. Might
	not be available.

	* bool *is_null: Array of booleans, is a particular attribute null. Might not
	be available.

	* nvalid: How many of the attributes we have extracted in (values, is_null)
	arrays.

	Tuple Representations Stored in a TupleTableSlot
	------------------------------------------------

	HeapTuple

	This is identical to the on-disk representation in a heap table. It's a
	serialized (or "materialized") tuple, stored either in a buffer that belongs to
	the buffer manager, or allocated with palloc onto the process' heap. Where a
	heap tuple is stored is important for resource management as shared buffers
	need to be correctly pinned.

	MinimalTuple

	It's similar to a palloc'ed HeapTuple, but the system columns are
	stripped. This means there's no useless MVCC information carried around in the
	executor.

	VirtualTuple

	A "virtual" tuple is an optimization used to minimize physical data copying in
	between plan nodes. Any pass-by-reference Datums in the tuple point to storage
	that is not directly associated with the TupleTableSlot. They can point to (1)
	part of a tuple stored in a lower plan node's output TupleTableSlot or (2) a
	function result constructed in a plan node's per-tuple econtext. It is the
	responsibility of the generating plan node to be sure these resources are not
	released for as long as the virtual tuple needs to be valid.

	Tuples that need to be copied anywhere (e.g. Motion, or DML) must be
	"materialized" into physical tuples.

	MemTuple

	In Cloudberry, MemTuple replaces the MinimalTuple. MemTuple is also a
	serialized tuple format. The main design goal is to reduce CPU usage so that
	getting to a certain column does not require decoding of the columns preceding
	it.

	Non-sequential accessing attributes (columns) of HeapTuple and MinimalTuple is
	done using slot_getsomeattrs or slot_getallattrs to populate some or all of the
	Datum* value pointers. Accessing the M-th attribute involves iterating through
	the first M-1 attributes, calculating their actual length (fixed or variable,
	null or not), the adding to an offset.

	MemTuple is an optimized representation which speeds-up non-sequential access
	of the At query start time, for each slot (or each place we use MemTuple), we
	compute a binding data structure from the schema (TupleDesc). For each
	attribute in the tuple, the binding structure contains information that
	determines where the attribute is located in the MemTuple. It consists of the
	following:

	1. The physical attribute number. Physical attr numbers may differ from
	schema because we reorder attributes as follows. 8 bytes aligned attrs go
	first, then 4 bytes aligned, then 2, then 1, and last variable length
	field.

	2. The offset where the attribute starts and the length of the attributes.

	3. Information to adjust the offset if there are null columns before (in
	physical order) the attribute.

	There are many functions to convert the tuple stored in a TupleTableSlot from
	one representation to another. Refer to access/memtup.h and
	executor/tuptable.h for more details.

	Toasted attributes
	------------------

	Values that are too large to fit into a heap tuple in-line are stored
	externally in a toast table. A large value is broken down into smaller fixed
	sized chunks. Each chunk is stored as a tuple in the toast table. More
	details can be found in access/tuptoaster.c.