This Apache Druid extension enables Druid to ingest and parse the Apache Avro data format as follows:
The Avro Stream Parser is deprecated.
To use the Avro extension, add the druid-avro-extensions to the list of loaded extensions. See Loading extensions for more information.
Druid supports most Avro types natively. This section describes some exceptions.
Druid has two modes for supporting union types.
The default mode treats unions as a single value regardless of the type of data populating the union.
If you want to operate on individual members of a union, set extractUnionsByType on the Avro parser. This configuration expands union values into nested objects according to the following rules:
int and string.record, fixed, and enum.This is safe because an Avro union can only contain a single member of each unnamed type and duplicates of the same named type are not allowed. For example, only a single array is allowed, multiple records (or other named types) are allowed as long as each has a unique name.
You can then access the members of the union with a flattenSpec like you would for other nested types.
The extension returns bytes and fixed Avro types as base64 encoded strings by default. To decode these types as UTF-8 strings, enable the binaryAsString option on the Avro parser.
The extension returns enum types as string of the enum symbol.
You can ingest record and map types representing nested data with a flattenSpec on the parser.
Druid does not currently support Avro logical types. It ignores them and handles fields according to the underlying primitive type.