layout: doc_page title: “Hadoop-based Batch Ingestion VS Native Batch Ingestion”

Comparison of Batch Ingestion Methods

Apache Druid (incubating) basically supports three types of batch ingestion: Apache Hadoop-based batch ingestion, native parallel batch ingestion, and native local batch ingestion. The below table shows what features are supported by each ingestion method.

Hadoop-based ingestionNative parallel ingestionNative local ingestion
Parallel indexingAlways parallelParallel if firehose is splittableAlways sequential
Supported indexing modesReplacing modeBoth appending and replacing modesBoth appending and replacing modes
External dependencyHadoop (it internally submits Hadoop jobs)No dependencyNo dependency
Supported rollup modesPerfect rollupBest-effort rollupBoth perfect and best-effort rollup
Supported partitioning methodsBoth Hash-based and range partitioningN/AHash-based partitioning (when forceGuaranteedRollup = true)
Supported input locationsAll locations accessible via HDFS client or Druid dataSourceAll implemented firehosesAll implemented firehoses
Supported file formatsAll implemented Hadoop InputFormatsCurrently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing FiniteFirehoseFactoryCurrently text file formats (CSV, TSV, JSON) by default. Additional formats can be added though a custom extension implementing FiniteFirehoseFactory
Saving parse exceptions in ingestion reportCurrently not supportedCurrently not supportedSupported
Custom segment versionSupported, but this is NOT recommendedN/AN/A