blob: cfc05fd51c2f6b15b4066352ac0f3ced5f15445d [file] [log] [blame] [view]
<!--
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
-->
# Variant Shredding
The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous values.
Query engines encode each Variant value in a self-describing format, and store it as a group containing `value` and `metadata` binary fields in Parquet.
Since data is often partially homogeneous, it can be beneficial to extract certain fields into separate Parquet columns to further improve performance.
This process is called **shredding**.
Shredding enables the use of Parquet's columnar representation for more compact data encoding, column statistics for data skipping, and partial projections.
For example, the query `SELECT variant_get(event, '$.event_ts', 'timestamp') FROM tbl` only needs to load field `event_ts`, and if that column is shredded, it can be read by columnar projection without reading or deserializing the rest of the `event` Variant.
Similarly, for the query `SELECT * FROM tbl WHERE variant_get(event, '$.event_type', 'string') = 'signup'`, the `event_type` shredded column metadata can be used for skipping and to lazily load the rest of the Variant.
## Variant Metadata
Variant metadata is stored in the top-level Variant group in a binary `metadata` column regardless of whether the Variant value is shredded.
All `value` columns within the Variant must use the same `metadata`.
All field names of a Variant, whether shredded or not, must be present in the metadata.
## Value Shredding
Variant values are stored in Parquet fields named `value`.
Each `value` field may have an associated shredded field named `typed_value` that stores the value when it matches a specific type.
When `typed_value` is present, readers **must** reconstruct shredded values according to this specification.
For example, a Variant field, `measurement` may be shredded as long values by adding `typed_value` with type `int64`:
```
required group measurement (VARIANT) {
required binary metadata;
optional binary value;
optional int64 typed_value;
}
```
The Parquet columns used to store variant metadata and values must be accessed by name, not by position.
The series of measurements `34, null, "n/a", 100` would be stored as:
| Value | `metadata` | `value` | `typed_value` |
|---------|------------------|-----------------------|---------------|
| 34 | `01 00` v1/empty | null | `34` |
| null | `01 00` v1/empty | `00` (null) | null |
| "n/a" | `01 00` v1/empty | `13 6E 2F 61` (`n/a`) | null |
| 100 | `01 00` v1/empty | null | `100` |
Both `value` and `typed_value` are optional fields used together to encode a single value.
Values in the two fields must be interpreted according to the following table:
| `value` | `typed_value` | Meaning |
|----------|---------------|-------------------------------------------------------------|
| null | null | The value is missing; only valid for shredded object fields |
| non-null | null | The value is present and may be any type, including null |
| null | non-null | The value is present and is the shredded type |
| non-null | non-null | The value is present and is a partially shredded object |
An object is _partially shredded_ when the `value` is an object and the `typed_value` is a shredded object.
Writers must not produce data where both `value` and `typed_value` are non-null, unless the Variant value is an object.
If a Variant is missing in a context where a value is required, readers must return a Variant null (`00`): basic type 0 (primitive) and physical type 0 (null).
For example, if a Variant is required (like `measurement` above) and both `value` and `typed_value` are null, the returned `value` must be `00` (Variant null).
### Shredded Value Types
Shredded values must use the following Parquet types:
| Variant Type | Parquet Physical Type | Parquet Logical Type |
|-----------------------------|-----------------------------------|--------------------------|
| boolean | BOOLEAN | |
| int8 | INT32 | INT(8, signed=true) |
| int16 | INT32 | INT(16, signed=true) |
| int32 | INT32 | |
| int64 | INT64 | |
| float | FLOAT | |
| double | DOUBLE | |
| decimal4 | INT32 | DECIMAL(P, S) |
| decimal8 | INT64 | DECIMAL(P, S) |
| decimal16 | BYTE_ARRAY / FIXED_LEN_BYTE_ARRAY | DECIMAL(P, S) |
| date | INT32 | DATE |
| time | INT64 | TIME(false, MICROS) |
| timestamptz(6) | INT64 | TIMESTAMP(true, MICROS) |
| timestamptz(9) | INT64 | TIMESTAMP(true, NANOS) |
| timestampntz(6) | INT64 | TIMESTAMP(false, MICROS) |
| timestampntz(9) | INT64 | TIMESTAMP(false, NANOS) |
| binary | BINARY | |
| string | BINARY | STRING |
| uuid | FIXED_LEN_BYTE_ARRAY[len=16] | UUID |
| array | GROUP; see Arrays below | LIST |
| object | GROUP; see Objects below | |
#### Primitive Types
Primitive values can be shredded using the equivalent Parquet primitive type from the table above for `typed_value`.
Unless the value is shredded as an object (see [Objects](#objects)), `typed_value` or `value` (but not both) must be non-null.
#### Arrays
Arrays can be shredded by using a 3-level Parquet list for `typed_value`.
If the value is not an array, `typed_value` must be null.
If the value is an array, `value` must be null.
The list `element` must be a required group.
The `element` group can contain `value` and `typed_value` fields.
The element's `value` field stores the element as Variant-encoded `binary` when the `typed_value` is not present or cannot represent it.
The `typed_value` field may be omitted when not shredding elements as a specific type.
The `value` field may be omitted when shredding elements as a specific type.
However, at least one of the two fields must be present.
For example, a `tags` Variant may be shredded as a list of strings using the following definition:
```
optional group tags (VARIANT) {
required binary metadata;
optional binary value;
optional group typed_value (LIST) { # must be optional to allow a null list
repeated group list {
required group element { # shredded element
optional binary value;
optional binary typed_value (STRING);
}
}
}
}
```
All elements of an array must be present (not missing) because the `array` Variant encoding does not allow missing elements.
That is, either `typed_value` or `value` (but not both) must be non-null.
Null elements must be encoded in `value` as Variant null: basic type 0 (primitive) and physical type 0 (null).
The series of `tags` arrays `["comedy", "drama"], ["horror", null], ["comedy", "drama", "romance"], null` would be stored as:
| Array | `value` | `typed_value `| `typed_value...value` | `typed_value...typed_value` |
|----------------------------------|-------------|---------------|-----------------------|--------------------------------|
| `["comedy", "drama"]` | null | non-null | [null, null] | [`comedy`, `drama`] |
| `["horror", null]` | null | non-null | [null, `00`] | [`horror`, null] |
| `["comedy", "drama", "romance"]` | null | non-null | [null, null, null] | [`comedy`, `drama`, `romance`] |
| null | `00` (null) | null | | |
#### Objects
Fields of an object can be shredded using a Parquet group for `typed_value` that contains shredded fields.
If the value is an object, `typed_value` must be non-null.
If the value is not an object, `typed_value` must be null.
Readers can assume that a value is not an object if `typed_value` is null and that `typed_value` field values are correct; that is, readers do not need to read the `value` column if `typed_value` fields satisfy the required fields.
Each shredded field in the `typed_value` group is represented as a required group that contains optional `value` and `typed_value` fields.
The `value` field stores the value as Variant-encoded `binary` when the `typed_value` cannot represent the field.
This layout enables readers to skip data based on the field statistics for `value` and `typed_value`.
The `typed_value` field may be omitted when not shredding fields as a specific type.
The `value` column of a partially shredded object must never contain fields represented by the Parquet columns in `typed_value` (shredded fields).
Readers may always assume that data is written correctly and that shredded fields in `typed_value` are not present in `value`.
As a result, reads when a field is defined in both `value` and a `typed_value` shredded field may be inconsistent.
For example, a Variant `event` field may shred `event_type` (`string`) and `event_ts` (`timestamp`) columns using the following definition:
```
optional group event (VARIANT) {
required binary metadata;
optional binary value; # a variant, expected to be an object
optional group typed_value { # shredded fields for the variant object
required group event_type { # shredded field for event_type
optional binary value;
optional binary typed_value (STRING);
}
required group event_ts { # shredded field for event_ts
optional binary value;
optional int64 typed_value (TIMESTAMP(true, MICROS));
}
}
}
```
The group for each named field must use repetition level `required`.
A field's `value` and `typed_value` are set to null (missing) to indicate that the field does not exist in the variant.
To encode a field that is present with a null value, the `value` must contain a Variant null: basic type 0 (primitive) and physical type 0 (null).
When both `value` and `typed_value` for a field are non-null, engines should fail.
If engines choose to read in such cases, then the `typed_value` column must be used.
Readers may always assume that data is written correctly and that only `value` or `typed_value` is defined.
As a result, reads when both `value` and `typed_value` are defined may be inconsistent with optimized reads that require only one of the columns.
The table below shows how the series of objects in the first column would be stored:
| Event object | `value` | `typed_value` | `typed_value.event_type.value` | `typed_value.event_type.typed_value` | `typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes |
|-----------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|----------------------------------------------------------------------------|
| `{"event_type": "noop", "event_ts": 1729794114937}` | null | non-null | null | `noop` | null | 1729794114937 | Fully shredded object |
| `{"event_type": "login", "event_ts": 1729794146402, "email": "user@example.com"}` | `{"email": "user@example.com"}` | non-null | null | `login` | null | 1729794146402 | Partially shredded object |
| `{"error_msg": "malformed: ..."}` | `{"error_msg", "malformed: ..."}` | non-null | null | null | null | null | Object with all shredded fields missing |
| `"malformed: not an object"` | `malformed: not an object` | null | | | | | Not an object (stored as Variant string) |
| `{"event_ts": 1729794240241, "click": "_button"}` | `{"click": "_button"}` | non-null | null | null | null | 1729794240241 | Field `event_type` is missing |
| `{"event_type": null, "event_ts": 1729794954163}` | null | non-null | `00` (field exists, is null) | null | null | 1729794954163 | Field `event_type` is present and is null |
| `{"event_type": "noop", "event_ts": "2024-10-24"}` | null | non-null | null | `noop` | `"2024-10-24"` | null | Field `event_ts` is present but not a timestamp |
| `{ }` | null | non-null | null | null | null | null | Object is present but empty |
| null | `00` (null) | null | | | | | Object/value is null |
| missing | null | null | | | | | Object/value is missing |
| INVALID: `{"event_type": "login", "event_ts": 1729795057774}` | `{"event_type": "login"}` | non-null | null | `login` | null | 1729795057774 | INVALID: Shredded field is present in `value` |
| INVALID: `{"event_type": "login"}` | `{"event_type": "login"}` | null | | | | | INVALID: Shredded field is present in `value`, while `typed_value` is null |
| INVALID: `"a"` | `"a"` | non-null | null | null | null | null | INVALID: `typed_value` is present and `value` is not an object |
| INVALID: `{}` | `02 00` (object with 0 fields) | null | | | | | INVALID: `typed_value` is null for object |
Invalid cases in the table above must not be produced by writers.
Readers must return an object when `typed_value` is non-null containing the shredded fields.
## Nesting
The `typed_value` associated with any Variant `value` field can be any shredded type, as shown in the sections above.
For example, the `event` object above may also shred sub-fields as object (`location`) or array (`tags`).
```
optional group event (VARIANT) {
required binary metadata;
optional binary value;
optional group typed_value {
required group event_type {
optional binary value;
optional binary typed_value (STRING);
}
required group event_ts {
optional binary value;
optional int64 typed_value (TIMESTAMP(true, MICROS));
}
required group location {
optional binary value;
optional group typed_value {
required group latitude {
optional binary value;
optional double typed_value;
}
required group longitude {
optional binary value;
optional double typed_value;
}
}
}
required group tags {
optional binary value;
optional group typed_value (LIST) {
repeated group list {
required group element {
optional binary value;
optional binary typed_value (STRING);
}
}
}
}
}
}
```
# Data Skipping
Statistics for `typed_value` columns can be used for file, row group, or page skipping when `value` is always null (missing).
When the corresponding `value` column is all nulls, all values must be the shredded `typed_value` field's type.
Because the type is known, comparisons with values of that type are valid.
`IS NULL`/`IS NOT NULL` and `IS NAN`/`IS NOT NAN` filter results are also valid.
Comparisons with values of other types are not necessarily valid and data should not be skipped.
Casting behavior for Variant is delegated to processing engines.
For example, the interpretation of a string as a timestamp may depend on the engine's SQL session time zone.
## Reconstructing a Shredded Variant
It is possible to recover an unshredded Variant value using a recursive algorithm, where the initial call is to `construct_variant` with the top-level Variant group fields.
```python
def construct_variant(metadata: Metadata, value: Variant, typed_value: Any) -> Variant:
"""Constructs a Variant from value and typed_value"""
if typed_value is not None:
if isinstance(typed_value, dict):
# this is a shredded object
object_fields = {
name: construct_variant(metadata, field.value, field.typed_value)
for (name, field) in typed_value
}
if value is not None:
# this is a partially shredded object
assert isinstance(value, VariantObject), "partially shredded value must be an object"
assert typed_value.keys().isdisjoint(value.keys()), "object keys must be disjoint"
# union the shredded fields and non-shredded fields
return VariantObject(metadata, object_fields).union(VariantObject(metadata, value))
else:
return VariantObject(metadata, object_fields)
elif isinstance(typed_value, list):
# this is a shredded array
assert value is None, "shredded array must not conflict with variant value"
elements = [
construct_variant(metadata, elem.value, elem.typed_value)
for elem in list(typed_value)
]
return VariantArray(metadata, elements)
else:
# this is a shredded primitive
assert value is None, "shredded primitive must not conflict with variant value"
return primitive_to_variant(typed_value)
elif value is not None:
return Variant(metadata, value)
else:
# value is missing
return None
def primitive_to_variant(typed_value: Any): Variant:
if isinstance(typed_value, int):
return VariantInteger(typed_value)
elif isinstance(typed_value, str):
return VariantString(typed_value)
...
```
## Backward and forward compatibility
Shredding is an optional feature of Variant, and readers must continue to be able to read a group containing only `value` and `metadata` fields.
Engines that do not write shredded values must be able to read shredded values according to this spec or must fail.
Different files may contain conflicting shredding schemas.
That is, files may contain different `typed_value` columns for the same Variant with incompatible types.
It may not be possible to infer or specify a single shredded schema that would allow all Parquet files for a table to be read without reconstructing the value as a Variant.