Apache Druid (incubating) basically supports three types of batch ingestion: Apache Hadoop-based batch ingestion, native parallel batch ingestion, and native local batch ingestion. The below table shows what features are supported by each ingestion method.
| Hadoop-based ingestion | Native parallel ingestion | Native local ingestion | |
|---|---|---|---|
| Parallel indexing | Always parallel | Parallel if firehose is splittable | Always sequential |
| Supported indexing modes | Replacing mode | Both appending and replacing modes | Both appending and replacing modes |
| External dependency | Hadoop (it internally submits Hadoop jobs) | No dependency | No dependency |
| Supported rollup modes | Perfect rollup | Best-effort rollup | Both perfect and best-effort rollup |
| Supported partitioning methods | Both Hash-based and range partitioning | N/A | Hash-based partitioning (when forceGuaranteedRollup = true) |
| Supported input locations | All locations accessible via HDFS client or Druid dataSource | All implemented firehoses | All implemented firehoses |
| Supported file formats | All implemented Hadoop InputFormats | Currently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing FiniteFirehoseFactory | Currently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing FiniteFirehoseFactory |
| Saving parse exceptions in ingestion report | Currently not supported | Currently not supported | Supported |
| Custom segment version | Supported, but this is NOT recommended | N/A | N/A |