Downloads

The latest version of Iceberg is 0.10.0.

To use Iceberg in Spark, download the runtime Jar and add it to the jars folder of your Spark install. Use iceberg-spark3-runtime for Spark 3, and iceberg-spark-runtime for Spark 2.4.

To use Iceberg in Hive, download the iceberg-hive-runtime Jar and add it to Hive using ADD JAR.

Gradle

To add a dependency on Iceberg in Gradle, add the following to build.gradle:

dependencies {
  compile 'org.apache.iceberg:iceberg-core:0.10.0'
}

You may also want to include iceberg-parquet for Parquet file support.

Maven

To add a dependency on Iceberg in Maven, add the following to your pom.xml:

<dependencies>
  ...
  <dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-core</artifactId>
    <version>0.10.0</version>
  </dependency>
  ...
</dependencies>

0.10.0 release notes

High-level features:

  • Format v2 support for building row-level operations (MERGE INTO) in processing engines
    • Note: format v2 is not yet finalized and does not have a forward-compatibility guarantee
  • Flink integration for writing to Iceberg tables and reading from Iceberg tables (reading supports batch mode only)
  • Hive integration for reading from Iceberg tables, with filter pushdown (experimental; configuration may change)

Important bug fixes:

  • #1706 fixes non-vectorized ORC reads in Spark that incorrectly skipped rows
  • #1536 fixes ORC conversion of notIn and notEqual to match null values
  • #1722 fixes Expressions.notNull returning an isNull predicate; API only, method was not used by processing engines
  • #1736 fixes IllegalArgumentException in vectorized Spark reads with negative decimal values
  • #1666 fixes file lengths returned by the ORC writer, using compressed size rather than uncompressed size
  • #1674 removes catalog expiration in HiveCatalogs
  • #1545 automatically refreshes tables in Spark when not caching table instances

Other notable changes:

  • The iceberg-hive module has been renamed to iceberg-hive-metastore to avoid confusion
  • Spark 3 is based on 3.0.1 that includes the fix for SPARK-32168
  • Hadoop tables will recover from version hint corruption
  • Tables can be configured with a required sort order
  • Data file locations can be customized with a dynamically loaded LocationProvider
  • ORC file imports can apply a name mapping for stats

A more exhaustive list of changes is available under the 0.10.0 release milestone.

Past releases

0.9.1

0.9.0

0.8.0

0.7.0