:::info This page describes SQL-based batch ingestion using the druid-multi-stage-query
extension, new in Druid 24.0. Refer to the ingestion methods table to determine which ingestion method is right for you. :::
Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.
Worker task stage outputs are stored in the working directory given by druid.indexer.task.baseDir
. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including “No space left on device”.
SELECT
StatementGROUPING SETS
are not implemented. Queries using these features return a QueryNotSupported error.
The numeric varieties of the EARLIEST
and LATEST
aggregators do not work properly. Attempting to use the numeric varieties of these aggregators lead to an error like java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair
. The string varieties, however, do work properly.
INSERT
and REPLACE
StatementsThe INSERT
and REPLACE
statements with column lists, like INSERT INTO tbl (a, b, c) SELECT ...
, is not implemented.
INSERT ... SELECT
and REPLACE ... SELECT
insert columns from the SELECT
statement based on column name. This differs from SQL standard behavior, where columns are inserted based on position.
INSERT
and REPLACE
do not support all options available in ingestion specs, including the createBitmapIndex
and multiValueHandling
dimension properties, and the indexSpec
tuningConfig
property.
EXTERN
FunctionThe schemaless dimensions feature is not available. All columns and their types must be specified explicitly using the signature
parameter of the EXTERN
function.
EXTERN
with input sources that match large numbers of files may exhaust available memory on the controller task.
EXTERN
refers to external files. Use FROM
to access druid
input sources.