The toast aims to address several issues related to handling huge tuples in the PAX storage format.
In Cloudberry, toast is used to store relatively large tuples, and the make_toast size depends on TOAST_TUPLE_THRESHOLD.
PAX must be thread-safe. Directly using the TOAST table to store TOAST data is not thread-safe. And if we use toast_tuple_init/toast_tuple_try_compression and other interfaces, it is also non-thread safe.
If the TOAST table is not used, then when a table has toast data, it will not be inserted into the table pg_toast.pg_toast_<reltoastrelid>.
PAX supports two kinds of toasts
compress toast: the structure is consistent with Cloudberry's implementationexternal toast: PAX customizationUnlike Cloudberry, the external toast only exists on disk and it can't be used in memory.
For the compress toast, we can directly write or read it in the tuple
For toast operation methods: PAX no longer reuses the methods in detoast.h/heaptoast.h, but writes its own set of operators.
Compress toast is consistent with Cloudberry compress toast.
This part of the data is a varlena structure, and it is of type varattrib_4b, which means that it has a varlena head and the storage range is less than 1G.
In addition, the lower limit of the datum length that needs to be compressed on PAX may be set to larger, and TOAST_TUPLE_THRESHOLD is no longer used as the threshold. Instead, we added a GUC(pax.min_size_of_compress_toast) to use.
typedef union
{
struct
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct
{
uint32 va_header;
uint32 va_rawsize; /* origin size */
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed;
} varattrib_4b;
---------------------------------------------------
| tag | compress | length |
---------------------------------------------------
| 1 bit | 1 bit | 30 bit |
---------------------------------------------------
External toast is no longer structurally identical to Cloudberry external toast.
The current Cloudberry external toast structure is varattrib_1b_e, which the data part is divided into two types
varatt_externalvaratt_expandedtypedef struct
{
uint8 va_header;
uint8 va_tag; /* type */
char va_data[FLEXIBLE_ARRAY_MEMBER];
} varattrib_1b_e;
However, the structure of varattrib_1b_e still be reused , so that PAX can determine the current toast through the va_tag type
So we defined a custom TAG(VARTAG_CUSTOM) in the Cloudberry to fill in va_tag.
The part of va_data:
typedef struct {
int32 va_rawsize; /* Original data size (doesn't include header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
uint64 va_extogsz; /* The origin size of external toast */
uint64 va_extoffs; /* The offset of external toast */
uint64 va_extsize; /* The size of external toast */
} pax_varatt_external;
External TOAST data is not stored as a datum within the tuple. Because the exector doesn‘t know how to detoast the PAX’s external toast. So in PAX, we need to make sure the external has been detoasted once the current column has been read.
We added a buffer in PaxColumn. Each column can use this buffer(named external buffer) to access its own external toast.
The pointer (va_extoffs + va_extsize) in the external toast will point to the location of its own external buffer (not the location of the save)
The external toast won't store column data but use a .toast file to store it.
StripeInformation (metadata part describing group)toastOffset: the starting offset of the external toast existing in the current group in the toast filetoastLength: the length of the external toast existing in the current group in the toast filenumberOfToast: How many toasts are there in the current group (compress toast is also included here)repeated extToastLength: the length of the external toast of each column in the current group, used to handle column projectionsPAX determines whether to generate TOAST data based on the PostgreSQL storage type.
This is consistent with the heap table.