Apache Impala Incubator

Clone this repo:
  1. 8343656 IMPALA-3905: Add HdfsScanner::GetNext() interface and implementation for Parquet. by Alex Behm · 1 year, 4 months ago master
  2. ac1215f IMPALA-3861: Replace BetweenPredicates with their equivalent CompoundPredicate. by Alex Behm · 1 year, 4 months ago
  3. b9f6392 IMPALA-3939: Data loading may fail on tpch kudu by Matthew Jacobs · 1 year, 4 months ago
  4. 36b4ea6 IMPALA-1683: Allow REFRESH on a single partition by Bikramjeet Vig · 1 year, 5 months ago
  5. dd33e8f IMPALA-3866: consistent user-facing terminology for scratch dirs by Tim Armstrong · 1 year, 4 months ago

Welcome to Impala

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.

Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

  • Best of breed performance and scalability.
  • Support for data stored in HDFS, Apache HBase and Amazon S3.
  • Wide analytic SQL support, including window functions and subqueries.
  • On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
  • Support for the most commonly-used Hadoop file formats, including the Apache Parquet (incubating) project.
  • Apache-licensed, 100% open source.

More about Impala

To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.

If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.