blob: ffd8ddb8dc860975d2ff916d1e8967b15a52c6e2 [file] [log] [blame]
[{"categories": null, "content": " Spark and Iceberg Quickstart This guide will get you up and running with an Iceberg and Spark environment, including sample code to highlight some powerful features. You can learn more about Iceberg\u2019s Spark runtime by checking out the Spark section.\nDocker-Compose Creating a table Writing Data to a Table Reading Data from a Table Adding A Catalog Next Steps Docker-Compose The fastest way to get started is to use a docker-compose file that uses the the tabulario/spark-iceberg image which contains a local Spark cluster with a configured Iceberg catalog. To use this, you\u2019ll need to install the Docker CLI as well as the Docker Compose CLI.\nOnce you have those, save the yaml below into a file named docker-compose.yml:\nversion: \"3\" services: spark-iceberg: image: tabulario/spark-iceberg depends_on: - postgres container_name: spark-iceberg environment: - SPARK_HOME=/opt/spark - PYSPARK_PYTON=/usr/bin/python3.9 - PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/spark/bin volumes: - ./warehouse:/home/iceberg/warehouse - ./notebooks:/home/iceberg/notebooks/notebooks ports: - 8888:8888 - 8080:8080 - 18080:18080 postgres: image: postgres:13.4-bullseye container_name: postgres environment: - POSTGRES_USER=admin - POSTGRES_PASSWORD=password - POSTGRES_DB=demo_catalog volumes: - ./postgres/data:/var/lib/postgresql/data Next, start up the docker containers with this command:\ndocker-compose up You can then run any of the following commands to start a Spark session.\nSparkSQL Spark-Shell PySpark docker exec -it spark-iceberg spark-sql docker exec -it spark-iceberg spark-shell docker exec -it spark-iceberg pyspark You can also launch a notebook server by running docker exec -it spark-iceberg notebook. The notebook server will be available at http://localhost:8888 Creating a table To create your first Iceberg table in Spark, run a CREATE TABLE command. Let\u2019s create a table using demo.nyc.taxis where demo is the catalog name, nyc is the database name, and taxis is the table name.\nSparkSQL Spark-Shell PySpark CREATE TABLE demo.nyc.taxis ( vendor_id bigint, trip_id bigint, trip_distance float, fare_amount double, store_and_fwd_flag string ) PARTITIONED BY (vendor_id); import org.apache.spark.sql.types._ import org.apache.spark.sql.Row val schema = StructType( Array( StructField(\"vendor_id\", LongType,true), StructField(\"trip_id\", LongType,true), StructField(\"trip_distance\", FloatType,true), StructField(\"fare_amount\", DoubleType,true), StructField(\"store_and_fwd_flag\", StringType,true) )) val df = spark.createDataFrame(spark.sparkContext.emptyRDD[Row],schema) df.writeTo(\"demo.nyc.taxis\").create() from pyspark.sql.types import DoubleType, FloatType, LongType, StructType,StructField, StringType schema = StructType([ StructField(\"vendor_id\", LongType(), True), StructField(\"trip_id\", LongType(), True), StructField(\"trip_distance\", FloatType(), True), StructField(\"fare_amount', DoubleType(), True), StructField(\"store_and_fwd_flag', StringType(), True) ]) df = spark.createDataFrame([], schema) df.writeTo(\"demo.nyc.taxis\").create() Iceberg catalogs support the full range of SQL DDL commands, including:\nCREATE TABLE ... PARTITIONED BY CREATE TABLE ... AS SELECT ALTER TABLE DROP TABLE Writing Data to a Table Once your table is created, you can insert records.\nSparkSQL Spark-Shell PySpark INSERT INTO demo.nyc.taxis VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y'); import org.apache.spark.sql.Row val schema = spark.table(\"demo.nyc.taxis\").schema val data = Seq( Row(1: Long, 1000371: Long, 1.8f: Float, 15.32: Double, \"N\": String), Row(2: Long, 1000372: Long, 2.5f: Float, 22.15: Double, \"N\": String), Row(2: Long, 1000373: Long, 0.9f: Float, 9.01: Double, \"N\": String), Row(1: Long, 1000374: Long, 8.4f: Float, 42.13: Double, \"Y\": String) ) val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.writeTo(\"demo.nyc.taxis\").append() schema = spark.table(\"demo.nyc.taxis\").schema data = [ (1, 1000371, 1.8, 15.32, \"N\"), (2, 1000372, 2.5, 22.15, \"N\"), (2, 1000373, 0.9, 9.01, \"N\"), (1, 1000374, 8.4, 42.13, \"Y\") ] df = spark.createDataFrame(data, schema) df.writeTo(\"demo.nyc.taxis\").append() Reading Data from a Table To read a table, simply use the Iceberg table\u2019s name.\nSparkSQL Spark-Shell PySpark SELECT * FROM demo.nyc.taxis; val df = spark.table(\"demo.nyc.taxis\").show() df = spark.table(\"demo.nyc.taxis\").show() Adding A Catalog Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. Catalogs are configured using properties under spark.sql.catalog.(catalog_name). In this guide, we use JDBC, but you can follow these instructions to configure other catalog types. To learn more, check out the Catalog page in the Spark section.\nThis configuration creates a path-based catalog named local for tables under $PWD/warehouse and adds support for Iceberg tables to Spark\u2019s built-in catalog.\nCLI spark-defaults.conf spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0\\ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \\ --conf spark.sql.catalog.spark_catalog.type=hive \\ --conf spark.sql.catalog.demo=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.demo.type=hadoop \\ --conf spark.sql.catalog.demo.warehouse=$PWD/warehouse \\ --conf spark.sql.defaultCatalog=demo spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0 spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog spark.sql.catalog.spark_catalog.type hive spark.sql.catalog.demo org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.demo.type hadoop spark.sql.catalog.demo.warehouse $PWD/warehouse spark.sql.defaultCatalog demo If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing USE demo; Next steps Adding Iceberg to Spark If you already have a Spark environment, you can add Iceberg, using the --packages option.\nSparkSQL Spark-Shell PySpark spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0 spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0 pyspark --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0 If you want to include Iceberg in your Spark installation, add the Iceberg Spark runtime to Spark\u2019s jars folder. You can download the runtime by visiting to the Releases page. Learn More Now that you\u2019re up an running with Iceberg and Spark, check out the Iceberg-Spark docs to learn more!\n", "description": "", "title": "Spark and Iceberg Quickstart", "uri": "/spark-quickstart/"}, {"categories": null, "content": " Downloads The latest version of Iceberg is 0.14.0.\n0.14.0 source tar.gz \u2013 signature \u2013 sha512 0.14.0 Spark 3.3_2.12 runtime Jar \u2013 3.3_2.13 0.14.0 Spark 3.2_2.12 runtime Jar \u2013 3.2_2.13 0.14.0 Spark 3.1 runtime Jar 0.14.0 Spark 3.0 runtime Jar 0.14.0 Spark 2.4 runtime Jar 0.14.0 Flink 1.15 runtime Jar 0.14.0 Flink 1.14 runtime Jar 0.14.0 Flink 1.13 runtime Jar 0.14.0 Hive runtime Jar To use Iceberg in Spark or Flink, download the runtime JAR for your engine version and add it to the jars folder of your installation.\nTo use Iceberg in Hive 2 or Hive 3, download the Hive runtime JAR and add it to Hive using ADD JAR.\nGradle To add a dependency on Iceberg in Gradle, add the following to build.gradle:\ndependencies { compile 'org.apache.iceberg:iceberg-core:0.14.0' } You may also want to include iceberg-parquet for Parquet file support.\nMaven To add a dependency on Iceberg in Maven, add the following to your pom.xml:\n<dependencies> ... <dependency> <groupId>org.apache.iceberg</groupId> <artifactId>iceberg-core</artifactId> <version>0.14.0</version> </dependency> ... </dependencies> 0.14.0 release Apache Iceberg 0.14.0 was released on 16 July 2022.\nHighlights Added several performance improvements for scan planning and Spark queries Added a common REST catalog client that uses change-based commits to resolve commit conflicts on the service side Added support for Spark 3.3, including AS OF syntax for SQL time travel queries Added support for Scala 2.13 with Spark 3.2 or later Added merge-on-read support for MERGE and UPDATE queries in Spark 3.2 or later Added support to rewrite partitions using zorder Added support for Flink 1.15 and dropped support for Flink 1.12 Added a spec and implementation for Puffin, a format for large stats and index blobs, like Theta sketches or bloom filters Added new interfaces for consuming data incrementally (both append and changelog scans) Added support for bulk operations and ranged reads to FileIO interfaces Added more metadata tables to show delete files in the metadata tree High-level features API Added IcebergBuild to expose Iceberg version and build information Added binary compatibility checking to the build (#4638, #4798) Added a new IncrementalAppendScan interface and planner implementation (#4580) Added a new IncrementalChangelogScan interface (#4870) Refactored the ScanTask hierarchy to create new task types for changelog scans (#5077) Added expression sanitizer (#4672) Added utility to check expression equivalence (#4947) Added support for serializing FileIO instances using initialization properties (#5178) Updated Snapshot methods to accept a FileIO to read metadata files, deprecated old methods (#4873) Added optional interfaces to FileIO, for batch deletes (#4052), prefix operations (#5096), and ranged reads (#4608) Core Added a common client for REST-based catalog services that uses a change-based protocol (#4320, #4319) Added Puffin, a file format for statistics and index payloads or sketches (#4944, #4537) Added snapshot references to track tags and branches (#4019) ManageSnapshots now supports multiple operations using transactions, and added branch and tag operations (#4128, #4071) ReplacePartitions and OverwriteFiles now support serializable isolation (#2925, #4052) Added new metadata tables: data_files (#4336), delete_files (#4243), all_delete_files, and all_files (#4694) Added deleted files to the files metadata table (#4336) and delete file counts to the manifests table (#4764) Added support for predicate pushdown for the all_data_files metadata table (#4382) and the all_manifests table (#4736) Added support for catalogs to default table properties on creation (#4011) Updated sort order construction to ensure all partition fields are added to avoid partition closed failures (#5131) Spark Spark 3.3 is now supported (#5056) Added SQL time travel using AS OF syntax in Spark 3.3 (#5156) Scala 2.13 is now supported for Spark 3.2 and 3.3 (#4009) Added support for the mergeSchema option for DataFrame writes (#4154) MERGE and UPDATE queries now support the lazy / merge-on-read strategy (#3984, #4047) Added zorder rewrite strategy to the rewrite_data_files stored procedure and action (#3983, #4902) Added a register_table stored procedure to create tables from metadata JSON files (#4810) Added a publish_changes stored procedure to publish staged commits by ID (#4715) Added CommitMetadata helper class to set snapshot summary properties from SQL (#4956) Added support to supply a file listing to remove orphan data files procedure and action (#4503) Added FileIO metrics to the Spark UI (#4030, #4050) DROP TABLE now supports the PURGE flag (#3056) Added support for custom isolation level for dynamic partition overwrites (#2925) and filter overwrites (#4293) Schema identifier fields are now shown in table properties (#4475) Abort cleanup now supports parallel execution (#4704) Flink Flink 1.15 is now supported (#4553) Flink 1.12 support was removed (#4551) Added a FLIP-27 source and builder to 1.14 and 1.15 (#5109) Added an option to set the monitor interval (#4887) and an option to limit the number of snapshots in a streaming read planning operation (#4943) Added support for write options, like write-format to Flink sink builder (#3998) Added support for task locality when reading from HDFS (#3817) Use Hadoop configuration files from hadoop-conf-dir property (#4622) Vendor integrations Added Dell ECS integration (#3376, #4221) JDBC catalog now supports namespace properties (#3275) AWS Glue catalog supports native Glue locking (#4166) AWS S3FileIO supports using S3 access points (#4334), bulk operations (#4052, #5096), ranged reads (#4608), and tagging at write time or in place of deletes (#4259, #4342) AWS GlueCatalog supports passing LakeFormation credentials (#4280) AWS DynamoDB catalog and lock supports overriding the DynamoDB endpoint (#4726) Nessie now supports namespaces and namespace properties (#4385, #4610) Nessie now passes most common catalog tests (#4392) Parquet Added support for row group skipping using Parquet bloom filters (#4938) Added table configuration options for writing Parquet bloom filters (#5035) ORC Support file rolling at a target file size (#4419) Support table compression settings, write.orc.compression-codec and write.orc.compression-strategy (#4273) Performance improvements Core Fixed manifest file handling in scan planning to open manifests in the planning threadpool (#5206) Avoided an extra S3 HEAD request by passing file length when opening manifest files (#5207) Refactored Arrow vectorized readers to avoid extra dictionary copies (#5137) Improved Arrow decimal handling to improve decimal performance (#5168, #5198) Added support for Avro files with Zstd compression (#4083) Column metrics are now disabled by default after the first 32 columns (#3959, #5215) Updated delete filters to copy row wrappers to avoid expensive type analysis (#5249) Snapshot expiration supports parallel execution (#4148) Manifest updates can use a custom thread pool (#4146) Spark Parquet vectorized reads are enabled by default (#4196) Scan statistics now adjust row counts for split data files (#4446) Implemented SupportsReportStatistics in ScanBuilder to work around SPARK-38962 (#5136) Updated Spark tables to avoid expensive (and inaccurate) size estimation (#5225) Flink Operators will now use a worker pool per job (#4177) Fixed ClassCastException thrown when reading arrays from Parquet (#4432) Hive Added vectorized Parquet reads for Hive 3 (#3980) Improved generic reader performance using copy instead of create (#4218) Notable bug fixes This release includes all bug fixes from the 0.13.x patch releases.\nCore Fixed an exception thrown when metadata-only deletes encounter delete files that are partially matched (#4304) Fixed transaction retries for changes without validations, like schema updates, that could ignore an update (#4464) Fixed failures when reading metadata tables with evolved partition specs (#4520, #4560) Fixed delete files dropped when a manifest is rewritten following a format version upgrade (#4514) Fixed missing metadata files resulting from an OOM during commit cleanup (#4673) Updated logging to use sanitized expressions to avoid leaking values (#4672) Spark Fixed Spark to skip calling abort when CommitStateUnknownException is thrown (#4687) Fixed MERGE commands with mixed case identifiers (#4848) Flink Fixed table property update failures when tables have a primary key (#4561) Integrations JDBC catalog behavior has been updated to pass common catalog tests (#4220, #4231) Dependency changes Updated Apache Avro to 1.10.2 (previously 1.10.1) Updated Apache Parquet to 1.12.3 (previously 1.12.2) Updated Apache ORC to 1.7.5 (previously 1.7.2) Updated Apache Arrow to 7.0.0 (previously 6.0.0) Updated AWS SDK to 2.17.131 (previously 2.15.7) Updated Nessie to 0.30.0 (previously 0.18.0) Updated Caffeine to 2.9.3 (previously 2.8.4) Past releases 0.13.2 Apache Iceberg 0.13.2 was released on June 15th, 2022.\nGit tag: 0.13.2 0.13.2 source tar.gz \u2013 signature \u2013 sha512 0.13.2 Spark 3.2 runtime Jar 0.13.2 Spark 3.1 runtime Jar 0.13.2 Spark 3.0 runtime Jar 0.13.2 Spark 2.4 runtime Jar 0.13.2 Flink 1.14 runtime Jar 0.13.2 Flink 1.13 runtime Jar 0.13.2 Flink 1.12 runtime Jar 0.13.2 Hive runtime Jar Important bug fixes and changes:\nCore #4673 fixes table corruption from OOM during commit cleanup #4514 row delta delete files were dropped in sequential commits after table format updated to v2 #4464 fixes an issue were conflicting transactions have been ignored during a commit #4520 fixes an issue with wrong table predicate filtering with evolved partition specs Spark #4663 fixes NPEs in Spark value converter #4687 fixes an issue with incorrect aborts when non-runtime exceptions were thrown in Spark Flink Note that there\u2019s a correctness issue when using upsert mode in Flink 1.12. Given that Flink 1.12 is deprecated, it was decided to not fix this bug but rather log a warning (see also #4754). Nessie #4509 fixes a NPE that occurred when accessing refreshed tables in NessieCatalog A more exhaustive list of changes is available under the 0.13.2 release milestone.\n0.13.1 Apache Iceberg 0.13.1 was released on February 14th, 2022.\nGit tag: 0.13.1 0.13.1 source tar.gz \u2013 signature \u2013 sha512 0.13.1 Spark 3.2 runtime Jar 0.13.1 Spark 3.1 runtime Jar 0.13.1 Spark 3.0 runtime Jar 0.13.1 Spark 2.4 runtime Jar 0.13.1 Flink 1.14 runtime Jar 0.13.1 Flink 1.13 runtime Jar 0.13.1 Flink 1.12 runtime Jar 0.13.1 Hive runtime Jar Important bug fixes:\nSpark\n#4023 fixes predicate pushdown in row-level operations for merge conditions in Spark 3.2. Prior to the fix, filters would not be extracted and targeted merge conditions were not pushed down leading to degraded performance for these targeted merge operations. #4024 fixes table creation in the root namespace of a Hadoop Catalog. Flink\n#3986 fixes manifest location collisions when there are multiple committers in the same Flink job. 0.13.0 Apache Iceberg 0.13.0 was released on February 4th, 2022.\nGit tag: 0.13.0 0.13.0 source tar.gz \u2013 signature \u2013 sha512 0.13.0 Spark 3.2 runtime Jar 0.13.0 Spark 3.1 runtime Jar 0.13.0 Spark 3.0 runtime Jar 0.13.0 Spark 2.4 runtime Jar 0.13.0 Flink 1.14 runtime Jar 0.13.0 Flink 1.13 runtime Jar 0.13.0 Flink 1.12 runtime Jar 0.13.0 Hive runtime Jar High-level features:\nCore Catalog caching now supports cache expiration through catalog property cache.expiration-interval-ms [#3543] Catalog now supports registration of Iceberg table from a given metadata file location [#3851] Hadoop catalog can be used with S3 and other file systems safely by using a lock manager [#3663] Vendor Integrations Google Cloud Storage (GCS) FileIO is supported with optimized read and write using GCS streaming transfer [#3711] Aliyun Object Storage Service (OSS) FileIO is supported [#3553] Any S3-compatible storage (e.g. MinIO) can now be accessed through AWS S3FileIO with custom endpoint and credential configurations [#3656] [#3658] AWS S3FileIO now supports server-side checksum validation [#3813] AWS GlueCatalog now displays more table information including table location, description [#3467] and columns [#3888] Using multiple FileIOs based on file path scheme is supported by configuring a ResolvingFileIO [#3593] Spark Spark 3.2 is supported [#3335] with merge-on-read DELETE [#3970] RewriteDataFiles action now supports sort-based table optimization [#2829] and merge-on-read delete compaction [#3454]. The corresponding Spark call procedure rewrite_data_files is also supported [#3375] Time travel queries now use snapshot schema instead of the table\u2019s latest schema [#3722] Spark vectorized reads now support row-level deletes [#3557] [#3287] add_files procedure now skips duplicated files by default (can be turned off with the check_duplicate_files flag) [#2895], skips folder without file [#2895] and partitions with null values [#2895] instead of throwing exception, and supports partition pruning for faster table import [#3745] Flink Flink 1.13 and 1.14 are supported [#3116] [#3434] Flink connector support is supported [#2666] Upsert write option is supported [#2863] Hive Table listing in Hive catalog can now skip non-Iceberg tables by disabling flag list-all-tables [#3908] Hive tables imported to Iceberg can now be read by IcebergInputFormat [#3312] File Formats ORC now supports writing delete file [#3248] [#3250] [#3366] Important bug fixes:\nCore Iceberg new data file root path is configured through write.data.path going forward. write.folder-storage.path and write.object-storage.path are deprecated [#3094] Catalog commit status is UNKNOWN instead of FAILURE when new metadata location cannot be found in snapshot history [#3717] Dropping table now also deletes old metadata files instead of leaving them strained [#3622] history and snapshots metadata tables can now query tables with no current snapshot instead of returning empty [#3812] Vendor Integrations Using cloud service integrations such as AWS GlueCatalog and S3FileIO no longer fail when missing Hadoop dependencies in the execution environment [#3590] AWS clients are now auto-closed when related FileIO or Catalog is closed. There is no need to close the AWS clients separately [#2878] Spark For Spark >= 3.1, REFRESH TABLE can now be used with Spark session catalog instead of throwing exception [#3072] Insert overwrite mode now skips partition with 0 record instead of failing the write operation [#2895] Spark snapshot expiration action now supports custom FileIO instead of just HadoopFileIO [#3089] REPLACE TABLE AS SELECT can now work with tables with columns that have changed partition transform. Each old partition field of the same column is converted to a void transform with a different name [#3421] Spark SQL filters containing binary or fixed literals can now be pushed down instead of throwing exception [#3728] Flink A ValidationException will be thrown if a user configures both catalog-type and catalog-impl. Previously it chose to use catalog-type. The new behavior brings Flink consistent with Spark and Hive [#3308] Changelog tables can now be queried without RowData serialization issues [#3240] java.sql.Time data type can now be written without data overflow problem [#3740] Avro position delete files can now be read without encountering NullPointerException [#3540] Hive Hive catalog can now be initialized with a null Hadoop configuration instead of throwing exception [#3252] Table creation can now succeed instead of throwing exception when some columns do not have comments [#3531] File Formats Parquet file writing issue is fixed for string data with over 16 unparseable chars (e.g. high/low surrogates) [#3760] ORC vectorized read is now configured using read.orc.vectorization.batch-size instead of read.parquet.vectorization.batch-size [#3133] Other notable changes:\nThe community has finalized the long-term strategy of Spark, Flink and Hive support. See Multi-Engine Support page for more details. 0.12.1 Apache Iceberg 0.12.1 was released on November 8th, 2021.\nGit tag: 0.12.1 0.12.1 source tar.gz \u2013 signature \u2013 sha512 0.12.1 Spark 3.x runtime Jar 0.12.1 Spark 2.4 runtime Jar 0.12.1 Flink runtime Jar 0.12.1 Hive runtime Jar Important bug fixes and changes:\n#3264 fixes validation failures that occurred after snapshot expiration when writing Flink CDC streams to Iceberg tables. #3264 fixes reading projected map columns from Parquet files written before Parquet 1.11.1. #3195 allows validating that commits that produce row-level deltas don\u2019t conflict with concurrently added files. Ensures users can maintain serializable isolation for update and delete operations, including merge operations. #3199 allows validating that commits that overwrite files don\u2019t conflict with concurrently added files. Ensures users can maintain serializable isolation for overwrite operations. #3135 fixes equality-deletes using DATE, TIMESTAMP, and TIME types. #3078 prevents the JDBC catalog from overwriting the jdbc.user property if any property called user exists in the environment. #3035 fixes drop namespace calls with the DyanmoDB catalog. #3273 fixes importing Avro files via add_files by correctly setting the number of records. #3332 fixes importing ORC files with float or double columns in add_files. A more exhaustive list of changes is available under the 0.12.1 release milestone.\n0.12.0 Apache Iceberg 0.12.0 was released on August 15, 2021. It consists of 395 commits authored by 74 contributors over a 139 day period.\nGit tag: 0.12.0 0.12.0 source tar.gz \u2013 signature \u2013 sha512 0.12.0 Spark 3.x runtime Jar 0.12.0 Spark 2.4 runtime Jar 0.12.0 Flink runtime Jar 0.12.0 Hive runtime Jar High-level features:\nCore Allow Iceberg schemas to specify one or more columns as row identifiers [#2465]. Note that this is a prerequisite for supporting upserts in Flink. Added JDBC [#1870] and DynamoDB [#2688] catalog implementations. Added predicate pushdown for partitions and files metadata tables [#2358, #2926]. Added a new, more flexible compaction action for Spark that can support different strategies such as bin packing and sorting. [#2501, #2609]. Added the ability to upgrade to v2 or create a v2 table using the table property format-version=2 [#2887]. Added support for nulls in StructLike collections [#2929]. Added key_metadata field to manifest lists for encryption [#2675]. Flink Added support for SQL primary keys [#2410]. Hive Added the ability to set the catalog at the table level in the Hive Metastore. This makes it possible to write queries that reference tables from multiple catalogs [#2129]. As a result of [#2129], deprecated the configuration property iceberg.mr.catalog which was previously used to configure the Iceberg catalog in MapReduce and Hive [#2565]. Added table-level JVM lock on commits[#2547]. Added support for Hive\u2019s vectorized ORC reader [#2613]. Spark Added SET and DROP IDENTIFIER FIELDS clauses to ALTER TABLE so people don\u2019t have to look up the DDL [#2560]. Added support for ALTER TABLE REPLACE PARTITION FIELD DDL [#2365]. Added support for micro-batch streaming reads for structured streaming in Spark3 [#2660]. Improved the performance of importing a Hive table by not loading all partitions from Hive and instead pushing the partition filter to the Metastore [#2777]. Added support for UPDATE statements in Spark [#2193, #2206]. Added support for Spark 3.1 [#2512]. Added RemoveReachableFiles action [#2415]. Added add_files stored procedure [#2210]. Refactored Actions API and added a new entry point. Added support for Hadoop configuration overrides [#2922]. Added support for the TIMESTAMP WITHOUT TIMEZONE type in Spark [#2757]. Added validation that files referenced by row-level deletes are not concurrently rewritten [#2308]. Important bug fixes:\nCore Fixed string bucketing with non-BMP characters [#2849]. Fixed Parquet dictionary filtering with fixed-length byte arrays and decimals [#2551]. Fixed a problem with the configuration of HiveCatalog [#2550]. Fixed partition field IDs in table replacement [#2906]. Hive Enabled dropping HMS tables even if the metadata on disk gets corrupted [#2583]. Parquet Fixed Parquet row group filters when types are promoted from int to long or from float to double [#2232] Spark Fixed MERGE INTO in Spark when used with SinglePartition partitioning [#2584]. Fixed nested struct pruning in Spark [#2877]. Fixed NaN handling for float and double metrics [#2464]. Fixed Kryo serialization for data and delete files [#2343]. Other notable changes:\nThe Iceberg Community voted to approve version 2 of the Apache Iceberg Format Specification. The differences between version 1 and 2 of the specification are documented here. Bugfixes and stability improvements for NessieCatalog. Improvements and fixes for Iceberg\u2019s Python library. Added a vectorized reader for Apache Arrow [#2286]. The following Iceberg dependencies were upgraded: Hive 2.3.8 [#2110]. Avro 1.10.1 [#1648]. Parquet 1.12.0 [#2441]. 0.11.1 Git tag: 0.11.1 0.11.1 source tar.gz \u2013 signature \u2013 sha512 0.11.1 Spark 3.0 runtime Jar 0.11.1 Spark 2.4 runtime Jar 0.11.1 Flink runtime Jar 0.11.1 Hive runtime Jar Important bug fixes:\n#2367 prohibits deleting data files when tables are dropped if GC is disabled. #2196 fixes data loss after compaction when large files are split into multiple parts and only some parts are combined with other files. #2232 fixes row group filters with promoted types in Parquet. #2267 avoids listing non-Iceberg tables in Glue. #2254 fixes predicate pushdown for Date in Hive. #2126 fixes writing of Date, Decimal, Time, UUID types in Hive. #2241 fixes vectorized ORC reads with metadata columns in Spark. #2154 refreshes the relation cache in DELETE and MERGE operations in Spark. 0.11.0 Git tag: 0.11.0 0.11.0 source tar.gz \u2013 signature \u2013 sha512 0.11.0 Spark 3.0 runtime Jar 0.11.0 Spark 2.4 runtime Jar 0.11.0 Flink runtime Jar 0.11.0 Hive runtime Jar High-level features:\nCore API now supports partition spec and sort order evolution Spark 3 now supports the following SQL extensions: MERGE INTO (experimental) DELETE FROM (experimental) ALTER TABLE \u2026 ADD/DROP PARTITION ALTER TABLE \u2026 WRITE ORDERED BY Invoke stored procedures using CALL Flink now supports streaming reads, CDC writes (experimental), and filter pushdown AWS module is added to support better integration with AWS, with AWS Glue catalog support and dedicated S3 FileIO implementation Nessie module is added to support integration with project Nessie Important bug fixes:\n#1981 fixes bug that date and timestamp transforms were producing incorrect values for dates and times before 1970. Before the fix, negative values were incorrectly transformed by date and timestamp transforms to 1 larger than the correct value. For example, day(1969-12-31 10:00:00) produced 0 instead of -1. The fix is backwards compatible, which means predicate projection can still work with the incorrectly transformed partitions written using older versions. #2091 fixes ClassCastException for type promotion int to long and float to double during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for int and float fields. #1998 fixes bug in HiveTableOperation that unlock is not called if new metadata cannot be deleted. Now it is guaranteed that unlock is always called for Hive catalog users. #1979 fixes table listing failure in Hadoop catalog when user does not have permission to some tables. Now the tables with no permission are ignored in listing. #1798 fixes scan task failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files for each scan task. #1785 fixes invalidation of metadata tables in CachingCatalog. When a table is dropped, all the metadata tables associated with it are also invalidated in the cache. #1960 fixes bug that ORC writer does not read metrics config and always use the default. Now customized metrics config is respected. Other notable changes:\nNaN counts are now supported in metadata Shared catalog properties are added in core library to standardize catalog level configurations Spark and Flink now support dynamically loading customized Catalog and FileIO implementations Spark 2 now supports loading tables from other catalogs, like Spark 3 Spark 3 now supports catalog names in DataFrameReader when using Iceberg as a format Flink now uses the number of Iceberg read splits as its job parallelism to improve performance and save resource. Hive (experimental) now supports INSERT INTO, case insensitive query, projection pushdown, create DDL with schema and auto type conversion ORC now supports reading tinyint, smallint, char, varchar types Avro to Iceberg schema conversion now preserves field docs 0.10.0 Git tag: 0.10.0 0.10.0 source tar.gz \u2013 signature \u2013 sha512 0.10.0 Spark 3.0 runtime Jar 0.10.0 Spark 2.4 runtime Jar 0.10.0 Flink runtime Jar 0.10.0 Hive runtime Jar High-level features:\nFormat v2 support for building row-level operations (MERGE INTO) in processing engines Note: format v2 is not yet finalized and does not have a forward-compatibility guarantee Flink integration for writing to Iceberg tables and reading from Iceberg tables (reading supports batch mode only) Hive integration for reading from Iceberg tables, with filter pushdown (experimental; configuration may change) Important bug fixes:\n#1706 fixes non-vectorized ORC reads in Spark that incorrectly skipped rows #1536 fixes ORC conversion of notIn and notEqual to match null values #1722 fixes Expressions.notNull returning an isNull predicate; API only, method was not used by processing engines #1736 fixes IllegalArgumentException in vectorized Spark reads with negative decimal values #1666 fixes file lengths returned by the ORC writer, using compressed size rather than uncompressed size #1674 removes catalog expiration in HiveCatalogs #1545 automatically refreshes tables in Spark when not caching table instances Other notable changes:\nThe iceberg-hive module has been renamed to iceberg-hive-metastore to avoid confusion Spark 3 is based on 3.0.1 that includes the fix for SPARK-32168 Hadoop tables will recover from version hint corruption Tables can be configured with a required sort order Data file locations can be customized with a dynamically loaded LocationProvider ORC file imports can apply a name mapping for stats A more exhaustive list of changes is available under the 0.10.0 release milestone.\n0.9.1 Git tag: 0.9.1 0.9.1 source tar.gz \u2013 signature \u2013 sha512 0.9.1 Spark 3.0 runtime Jar 0.9.1 Spark 2.4 runtime Jar 0.9.0 Git tag: 0.9.0 0.9.0 source tar.gz \u2013 signature \u2013 sha512 0.9.0 Spark 3.0 runtime Jar 0.9.0 Spark 2.4 runtime Jar 0.8.0 Git tag: apache-iceberg-0.8.0-incubating 0.8.0-incubating source tar.gz \u2013 signature \u2013 sha512 0.8.0-incubating Spark 2.4 runtime Jar 0.7.0 Git tag: apache-iceberg-0.7.0-incubating 0.7.0-incubating source tar.gz \u2013 signature \u2013 sha512 0.7.0-incubating Spark 2.4 runtime Jar ", "description": "", "title": "Releases", "uri": "/releases/"}, {"categories": null, "content": " Available Benchmarks and how to run them Benchmarks are located under <project-name>/jmh. It is generally favorable to only run the tests of interest rather than running all available benchmarks. Also note that JMH benchmarks run within the same JVM as the system-under-test, so results might vary between runs.\nRunning Benchmarks on GitHub It is possible to run one or more Benchmarks via the JMH Benchmarks GH action on your own fork of the Iceberg repo. This GH action takes the following inputs:\nThe repository name where those benchmarks should be run against, such as apache/iceberg or <user>/iceberg The branch name to run benchmarks against, such as master or my-cool-feature-branch A list of comma-separated double-quoted Benchmark names, such as \"IcebergSourceFlatParquetDataReadBenchmark\", \"IcebergSourceFlatParquetDataFilterBenchmark\", \"IcebergSourceNestedListParquetDataWriteBenchmark\" Benchmark results will be uploaded once all benchmarks are done.\nIt is worth noting that the GH runners have limited resources so the benchmark results should rather be seen as an indicator to guide developers in understanding code changes. It is likely that there is variability in results across different runs, therefore the benchmark results shouldn\u2019t be used to form assumptions around production choices.\nRunning Benchmarks locally Below are the existing benchmarks shown with the actual commands on how to run them locally.\nIcebergSourceNestedListParquetDataWriteBenchmark A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt\nSparkParquetReadersNestedDataBenchmark A benchmark that evaluates the performance of reading nested Parquet data using Iceberg and Spark Parquet readers. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt\nSparkParquetWritersFlatDataBenchmark A benchmark that evaluates the performance of writing Parquet data with a flat schema using Iceberg and Spark Parquet writers. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt\nIcebergSourceFlatORCDataReadBenchmark A benchmark that evaluates the performance of reading ORC data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt\nSparkParquetReadersFlatDataBenchmark A benchmark that evaluates the performance of reading Parquet data with a flat schema using Iceberg and Spark Parquet readers. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt\nVectorizedReadDictionaryEncodedFlatParquetDataBenchmark A benchmark to compare performance of reading Parquet dictionary encoded data with a flat schema using vectorized Iceberg read path and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt\nIcebergSourceNestedListORCDataWriteBenchmark A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt\nVectorizedReadFlatParquetDataBenchmark A benchmark to compare performance of reading Parquet data with a flat schema using vectorized Iceberg read path and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt\nIcebergSourceFlatParquetDataWriteBenchmark A benchmark that evaluates the performance of writing Parquet data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt\nIcebergSourceNestedAvroDataReadBenchmark A benchmark that evaluates the performance of reading Avro data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt\nIcebergSourceFlatAvroDataReadBenchmark A benchmark that evaluates the performance of reading Avro data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt\nIcebergSourceNestedParquetDataWriteBenchmark A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt\nIcebergSourceNestedParquetDataReadBenchmark A benchmark that evaluates the performance of reading nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3: ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt\nIcebergSourceNestedORCDataReadBenchmark A benchmark that evaluates the performance of reading ORC data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt\nIcebergSourceFlatParquetDataReadBenchmark A benchmark that evaluates the performance of reading Parquet data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt\nIcebergSourceFlatParquetDataFilterBenchmark A benchmark that evaluates the file skipping capabilities in the Spark data source for Iceberg. This class uses a dataset with a flat schema, where the records are clustered according to the column used in the filter predicate. The performance is compared to the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt\nIcebergSourceNestedParquetDataFilterBenchmark A benchmark that evaluates the file skipping capabilities in the Spark data source for Iceberg. This class uses a dataset with nested data, where the records are clustered according to the column used in the filter predicate. The performance is compared to the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3: ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt\nSparkParquetWritersNestedDataBenchmark A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and Spark Parquet writers. To run this benchmark for either spark-2 or spark-3: ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt ", "description": "", "title": "Benchmarks", "uri": "/benchmarks/"}, {"categories": null, "content": " Iceberg Blogs Here is a list of company blogs that talk about Iceberg. The blogs are ordered from most recent to oldest.\nNear Real-Time Ingestion For Trino Date: August 4th, 2022, Company: Starburst\nAuthors: Eric Hwang, Monica Miller, Brian Zhan\nMigrating a Hive Table to an Iceberg Table Hands-on Tutorial Date: June 6th, 2022, Company: Dremio\nAuthor: Alex Merced\nFewer Accidental Full Table Scans Brought to You by Apache Iceberg\u2019s Hidden Partitioning Date: May 21st, 2022, Company: Dremio\nAuthor: Alex Merced\nAn Introduction To The Iceberg Java API Part 2 - Table Scans Date: May 11th, 2022, Company: Tabular\nAuthor: Sam Redai\nIceberg\u2019s Guiding Light: The Iceberg Open Table Format Specification Date: April 26th, 2022, Company: Tabular\nAuthor: Sam Redai\nHow to Migrate a Hive Table to an Iceberg Table Date: April 15th, 2022, Company: Dremio\nAuthor: Alex Merced\nUsing Iceberg\u2019s S3FileIO Implementation To Store Your Data In MinIO Date: April 14th, 2022, Company: Tabular\nAuthor: Sam Redai\nMaintaining Iceberg Tables \u2013 Compaction, Expiring Snapshots, and More Date: April 7th, 2022, Company: Dremio\nAuthor: Alex Merced\nAn Introduction To The Iceberg Java API - Part 1 Date: April 1st, 2022, Company: Tabular\nAuthor: Sam Redai\nIntegrated Audits: Streamlined Data Observability With Apache Iceberg Date: March 2nd, 2022, Company: Tabular\nAuthor: Sam Redai\nIntroducing Apache Iceberg in Cloudera Data Platform Date: February 23rd, 2022, Company: Cloudera\nAuthors: Bill Zhang, Peter Vary, Marton Bod, Wing Yew Poon\nWhat\u2019s new in Iceberg 0.13 Date: February 22nd, 2022, Company: Tabular\nAuthor: Ryan Blue\nApache Iceberg Becomes Industry Open Standard with Ecosystem Adoption Date: February 3rd, 2022, Company: Dremio\nAuthor: Mark Lyons\nDocker, Spark, and Iceberg: The Fastest Way to Try Iceberg! Date: February 2nd, 2022, Company: Tabular\nAuthor: Sam Redai, Kyle Bendickson\nExpanding the Data Cloud with Apache Iceberg Date: January 21st, 2022, Company: Snowflake\nAuthor: James Malone\nIceberg FileIO: Cloud Native Tables Date: December 16th, 2021, Company: Tabular\nAuthor: Daniel Weeks\nUsing Spark in EMR with Apache Iceberg Date: December 10th, 2021, Company: Tabular\nAuthor: Sam Redai\nUsing Flink CDC to synchronize data from MySQL sharding tables and build real-time data lake Date: November 11th, 2021, Company: Ververica, Alibaba Cloud\nAuthor: Yuxia Luo, Jark Wu, Zheng Hu\nMetadata Indexing in Iceberg Date: October 10th, 2021, Company: Tabular\nAuthor: Ryan Blue\nUsing Debezium to Create a Data Lake with Apache Iceberg Date: October 20th, 2021, Company: Memiiso Community\nAuthor: Ismail Simsek\nHow to Analyze CDC Data in Iceberg Data Lake Using Flink Date: June 15th, 2021, Company: Alibaba Cloud Community\nAuthor: Li Jinsong, Hu Zheng, Yang Weihai, Peidan Li\nApache Iceberg: An Architectural Look Under the Covers Date: July 6th, 2021, Company: Dremio\nAuthor: Jason Hughes\nMigrating to Apache Iceberg at Adobe Experience Platform Date: Jun 17th, 2021, Company: Adobe\nAuthor: Romin Parekh, Miao Wang, Shone Sadler\nFlink + Iceberg: How to Construct a Whole-scenario Real-time Data Warehouse Date: Jun 8th, 2021, Company: Tencent\nAuthor Shu (Simon Su) Su\nTrino on Ice III: Iceberg Concurrency Model, Snapshots, and the Iceberg Spec Date: May 25th, 2021, Company: Starburst\nAuthor: Brian Olsen\nTrino on Ice II: In-Place Table Evolution and Cloud Compatibility with Iceberg Date: May 11th, 2021, Company: Starburst\nAuthor: Brian Olsen\nTrino On Ice I: A Gentle Introduction To Iceberg Date: Apr 27th, 2021, Company: Starburst\nAuthor: Brian Olsen\nApache Iceberg: A Different Table Design for Big Data Date: Feb 1st, 2021, Company: thenewstack.io\nAuthor: Susan Hall\nA Short Introduction to Apache Iceberg Date: Jan 26th, 2021, Company: Expedia\nAuthor: Christine Mathiesen\nTaking Query Optimizations to the Next Level with Iceberg Date: Jan 14th, 2021, Company: Adobe\nAuthor: Gautam Kowshik, Xabriel J. Collazo Mojica\nFastIngest: Low-latency Gobblin with Apache Iceberg and ORC format Date: Jan 6th, 2021, Company: Linkedin\nAuthor: Zihan Li, Sudarshan Vasudevan, Lei Sun, Shirshanka Das\nHigh Throughput Ingestion with Iceberg Date: Dec 22nd, 2020, Company: Adobe\nAuthor: Andrei Ionescu, Shone Sadler, Anil Malkani\nOptimizing data warehouse storage Date: Dec 21st, 2020, Company: Netflix\nAuthor: Anupom Syam\nIceberg at Adobe Date: Dec 3rd, 2020, Company: Adobe\nAuthor: Shone Sadler, Romin Parekh, Anil Malkani\nBulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores Date: Oct 27th, 2020, Company: Netflix\nAuthor: Tianlong Chen, Ioannis Papapanagiotou\n", "description": "", "title": "Blogs", "uri": "/blogs/"}, {"categories": null, "content": " Welcome! Apache Iceberg tracks issues in GitHub and prefers to receive contributions as pull requests.\nCommunity discussions happen primarily on the dev mailing list, on apache-iceberg Slack workspace, and on specific GitHub issues.\nContribute See Contributing for more details on how to contribute to Iceberg.\nIssues Issues are tracked in GitHub:\nView open issues Open a new issue Slack We use the Apache Iceberg workspace on Slack. To be invited, follow this invite link.\nPlease note that this link may occasionally break when Slack does an upgrade. If you encounter problems using it, please let us know by sending an email to dev@iceberg.apache.org.\nMailing Lists Iceberg has four mailing lists:\nDevelopers: dev@iceberg.apache.org \u2013 used for community discussions Subscribe Unsubscribe Archive Commits: commits@iceberg.apache.org \u2013 distributes commit notifications Subscribe Unsubscribe Archive Issues: issues@iceberg.apache.org \u2013 Github issue tracking Subscribe Unsubscribe Archive Private: private@iceberg.apache.org \u2013 private list for the PMC to discuss sensitive issues related to the health of the project Archive ", "description": "", "title": "Community", "uri": "/community/"}, {"categories": null, "content": " Contributing In this page, you will find some guidelines on contributing to Apache Iceberg. Please keep in mind that none of these are hard rules and they\u2019re meant as a collection of helpful suggestions to make contributing as seamless of an experience as possible.\nIf you are thinking of contributing but first would like to discuss the change you wish to make, we welcome you to head over to the Community page on the official Iceberg documentation site to find a number of ways to connect with the community, including slack and our mailing lists. Of course, always feel free to just open a new issue in the GitHub repo.\nThe Iceberg Project is hosted on GitHub at https://github.com/apache/iceberg.\nPull Request Process The Iceberg community prefers to receive contributions as Github pull requests.\nView open pull requests\nPRs are automatically labeled based on the content by our github-actions labeling action It\u2019s helpful to include a prefix in the summary that provides context to PR reviewers, such as Build:, Docs:, Spark:, Flink:, Core:, API: If a PR is related to an issue, adding Closes #1234 in the PR description will automatically close the issue and helps keep the project clean If a PR is posted for visibility and isn\u2019t necessarily ready for review or merging, be sure to convert the PR to a draft Building the Project Locally Iceberg is built using Gradle with Java 8 or Java 11.\nTo invoke a build and run tests: ./gradlew build To skip tests: ./gradlew build -x test -x integrationTest To fix code style: ./gradlew spotlessApply Iceberg table support is organized in library modules:\niceberg-common contains utility classes used in other modules iceberg-api contains the public Iceberg API iceberg-core contains implementations of the Iceberg API and support for Avro data files, this is what processing engines should depend on iceberg-parquet is an optional module for working with tables backed by Parquet files iceberg-arrow is an optional module for reading Parquet into Arrow memory iceberg-orc is an optional module for working with tables backed by ORC files iceberg-hive-metastore is an implementation of Iceberg tables backed by the Hive metastore Thrift client iceberg-data is an optional module for working with tables directly from JVM applications This project Iceberg also has modules for adding Iceberg support to processing engines:\niceberg-spark2 is an implementation of Spark\u2019s Datasource V2 API in 2.4 for Iceberg (use iceberg-spark-runtime for a shaded version) iceberg-spark3 is an implementation of Spark\u2019s Datasource V2 API in 3.0 for Iceberg (use iceberg-spark3-runtime for a shaded version) iceberg-flink contains classes for integrating with Apache Flink (use iceberg-flink-runtime for a shaded version) iceberg-mr contains an InputFormat and other classes for integrating with Apache Hive iceberg-pig is an implementation of Pig\u2019s LoadFunc API for Iceberg Setting up IDE and Code Style Configuring Code Formatter for Eclipse/IntelliJ Follow the instructions for Eclipse or IntelliJ to install the google-java-format plugin (note the required manual actions for IntelliJ).\nIceberg Code Contribution Guidelines Style For Python, please use the tox command tox -e format to apply autoformatting to the project.\nJava code adheres to the Google style, which will be verified via ./gradlew spotlessCheck during builds. In order to automatically fix Java code style issues, please use ./gradlew spotlessApply.\nNOTE: The google-java-format plugin will always use the latest version of the google-java-format. However, spotless itself is configured to use google-java-format 1.7 since that version is compatible with JDK 8. When formatting the code in the IDE, there is a slight chance that it will produce slightly different results. In such a case please run ./gradlew spotlessApply as CI will check the style against google-java-format 1.7.\nCopyright Each file must include the Apache license information as a header.\nLicensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Configuring Copyright for IntelliJ IDEA Every file needs to include the Apache license as a header. This can be automated in IntelliJ by adding a Copyright profile:\nIn the Settings/Preferences dialog go to Editor \u2192 Copyright \u2192 Copyright Profiles.\nAdd a new profile and name it Apache.\nAdd the following text as the license text:\nLicensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Go to Editor \u2192 Copyright and choose the Apache profile as the default profile for this project.\nClick Apply.\nJava style guidelines Method naming Make method names as short as possible, while being clear. Omit needless words. Avoid get in method names, unless an object must be a Java bean. In most cases, replace get with a more specific verb that describes what is happening in the method, like find or fetch. If there isn\u2019t a more specific verb or the method is a getter, omit get because it isn\u2019t helpful to readers and makes method names longer. Where possible, use words and conjugations that form correct sentences in English when read For example, Transform.preservesOrder() reads correctly in an if statement: if (transform.preservesOrder()) { ... } Boolean arguments Avoid boolean arguments to methods that are not private to avoid confusing invocations like sendMessage(false). It is better to create two methods with names and behavior, even if both are implemented by one internal method.\n// prefer exposing suppressFailure in method names public void sendMessageIgnoreFailure() { sendMessageInternal(true); } public void sendMessage() { sendMessageInternal(false); } private void sendMessageInternal(boolean suppressFailure) { ... } When passing boolean arguments to existing or external methods, use inline comments to help the reader understand actions without an IDE.\n// BAD: it is not clear what false controls dropTable(identifier, false); // GOOD: these uses of dropTable are clear to the reader dropTable(identifier, true /* purge data */); dropTable(identifier, purge); Config naming Use - to link words in one concept For example, preferred convection access-key-id rather than access.key.id Use . to create a hierarchy of config groups For example, s3 in s3.access-key-id, s3.secret-access-key Running Benchmarks Some PRs/changesets might require running benchmarks to determine whether they are affecting the baseline performance. Currently there is no \u201cpush a single button to get a performance comparison\u201d solution available, therefore one has to run JMH performance tests on their local machine and post the results on the PR.\nSee Benchmarks for a summary of available benchmarks and how to run them.\nWebsite and Documentation Updates Currently, there is an iceberg-docs repository which contains the HTML/CSS and other files needed for the Iceberg website. The docs folder in the Iceberg repository contains the markdown content for the documentation site. All markdown changes should still be made to this repository.\nSubmitting Pull Requests Changes to the markdown contents should be submitted directly to this repository.\nChanges to the website appearance (e.g. HTML, CSS changes) should be submitted to the iceberg-docs repository against the main branch.\nChanges to the documentation of old Iceberg versions should be submitted to the iceberg-docs repository against the specific version branch.\nReporting Issues All issues related to the doc website should still be submitted to the Iceberg repository. The GitHub Issues feature of the iceberg-docs repository is disabled.\nRunning Locally Clone the iceberg-docs repository to run the website locally:\ngit clone git@github.com:apache/iceberg-docs.git cd iceberg-docs To start the landing page site locally, run:\ncd landing-page && hugo serve To start the documentation site locally, run:\ncd docs && hugo serve If you would like to see how the latest website looks based on the documentation in the Iceberg repository, you can copy docs to the iceberg-docs repository by:\nrm -rf docs/content/docs rm -rf landing-page/content/common cp -r <path to iceberg repo>/docs/versioned docs/content/docs cp -r <path to iceberg repo>/docs/common landing-page/content/common ", "description": "", "title": "Contribute", "uri": "/contribute/"}, {"categories": null, "content": " Setup To create a release candidate, you will need:\nApache LDAP credentals for Nexus and SVN A GPG key for signing, published in KEYS If you have not published your GPG key yet, you must publish it before sending the vote email by doing:\nsvn co https://dist.apache.org/repos/dist/dev/iceberg icebergsvn cd icebergsvn echo \"\" >> KEYS # append a newline gpg --list-sigs <YOUR KEY ID HERE> >> KEYS # append signatures gpg --armor --export <YOUR KEY ID HERE> >> KEYS # append public key block svn commit -m \"add key for <YOUR NAME HERE>\" Nexus access Nexus credentials are configured in your personal ~/.gradle/gradle.properties file using mavenUser and mavenPassword:\nmavenUser=yourApacheID mavenPassword=SomePassword PGP signing The release scripts use the command-line gpg utility so that signing can use the gpg-agent and does not require writing your private key\u2019s passphrase to a configuration file.\nTo configure gradle to sign convenience binary artifacts, add the following settings to ~/.gradle/gradle.properties:\nsigning.gnupg.keyName=Your Name (CODE SIGNING KEY) To use gpg instead of gpg2, also set signing.gnupg.executable=gpg\nFor more information, see the Gradle signing documentation.\nApache repository The release should be executed against https://github.com/apache/iceberg.git instead of any fork. Set it as remote with name apache for release if it is not already set up.\nCreating a release candidate Build the source release To create the source release artifacts, run the source-release.sh script with the release version and release candidate number:\ndev/source-release.sh -v 0.13.0 -r 0 -k <YOUR KEY ID HERE> Example console output:\nPreparing source for apache-iceberg-0.13.0-rc1 Adding version.txt and tagging release... [master ca8bb7d0] Add version.txt for release 0.13.0 1 file changed, 1 insertion(+) create mode 100644 version.txt Pushing apache-iceberg-0.13.0-rc1 to origin... Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Delta compression using up to 12 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (4/4), 433 bytes | 433.00 KiB/s, done. Total 4 (delta 1), reused 0 (delta 0) remote: Resolving deltas: 100% (1/1), completed with 1 local object. To https://github.com/apache/iceberg.git * [new tag] apache-iceberg-0.13.0-rc1 -> apache-iceberg-0.13.0-rc1 Creating tarball using commit ca8bb7d0821f35bbcfa79a39841be8fb630ac3e5 Signing the tarball... Checking out Iceberg RC subversion repo... Checked out revision 52260. Adding tarball to the Iceberg distribution Subversion repo... A tmp/apache-iceberg-0.13.0-rc1 A tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz.asc A (bin) tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz A tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz.sha512 Adding tmp/apache-iceberg-0.13.0-rc1 Adding (bin) tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz Adding tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz.asc Adding tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz.sha512 Transmitting file data ...done Committing transaction... Committed revision 52261. Creating release-announcement-email.txt... Success! The release candidate is available here: https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-0.13.0-rc1 Commit SHA1: ca8bb7d0821f35bbcfa79a39841be8fb630ac3e5 We have generated a release announcement email for you here: /Users/jackye/iceberg/release-announcement-email.txt Please note that you must update the Nexus repository URL contained in the mail before sending it out. The source release script will create a candidate tag based on the HEAD revision in git and will prepare the release tarball, signature, and checksum files. It will also upload the source artifacts to SVN.\nNote the commit SHA1 and candidate location because those will be added to the vote thread.\nOnce the source release is ready, use it to stage convenience binary artifacts in Nexus.\nBuild and stage convenience binaries Convenience binaries are created using the source release tarball from in the last step.\nUntar the source release and go into the release directory:\ntar xzf apache-iceberg-0.13.0.tar.gz cd apache-iceberg-0.13.0 To build and publish the convenience binaries, run the dev/stage-binaries.sh script. This will push to a release staging repository.\ndev/stage-binaries.sh Next, you need to close the staging repository:\nGo to Nexus and log in In the menu on the left, choose \u201cStaging Repositories\u201d Select the Iceberg repository If multiple staging repositories are created after running the script, set org.gradle.parallel=false in gradle.properties At the top, select \u201cClose\u201d and follow the instructions In the comment field use \u201cApache Iceberg <version> RC<num>\u201d Start a VOTE thread The last step for a candidate is to create a VOTE thread on the dev mailing list. The email template is already generated in release-announcement-email.txt with some details filled.\nExample title subject:\n[VOTE] Release Apache Iceberg <VERSION> RC<NUM> Example content:\nHi everyone, I propose the following RC to be released as official Apache Iceberg <VERSION> release. The commit id is <SHA1> * This corresponds to the tag: apache-iceberg-<VERSION>-rc<NUM> * https://github.com/apache/iceberg/commits/apache-iceberg-<VERSION>-rc<NUM> * https://github.com/apache/iceberg/tree/<SHA1> The release tarball, signature, and checksums are here: * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-<VERSION>-rc<NUM>/ You can find the KEYS file here: * https://dist.apache.org/repos/dist/dev/iceberg/KEYS Convenience binary artifacts are staged in Nexus. The Maven repository URL is: * https://repository.apache.org/content/repositories/orgapacheiceberg-<ID>/ This release includes important changes that I should have summarized here, but I'm lazy. Please download, verify, and test. Please vote in the next 72 hours. [ ] +1 Release this as Apache Iceberg <VERSION> [ ] +0 [ ] -1 Do not release this because... When a candidate is passed or rejected, reply with the voting result:\nSubject: [RESULT][VOTE] Release Apache Iceberg <VERSION> RC<NUM> Thanks everyone who participated in the vote for Release Apache Iceberg <VERSION> RC<NUM>. The vote result is: +1: 3 (binding), 5 (non-binding) +0: 0 (binding), 0 (non-binding) -1: 0 (binding), 0 (non-binding) Therefore, the release candidate is passed/rejected. Finishing the release After the release vote has passed, you need to release the last candidate\u2019s artifacts.\nFirst, copy the source release directory to releases:\nmkdir iceberg cd iceberg svn co https://dist.apache.org/repos/dist/dev/iceberg candidates svn co https://dist.apache.org/repos/dist/release/iceberg releases cp -r candidates/apache-iceberg-<VERSION>-rcN/ releases/apache-iceberg-<VERSION> cd releases svn add apache-iceberg-<VERSION> svn ci -m 'Iceberg: Add release <VERSION>' Next, add a release tag to the git repository based on the passing candidate tag:\ngit tag -am 'Release Apache Iceberg <VERSION>' apache-iceberg-<VERSION> apache-iceberg-<VERSION>-rcN Then release the candidate repository in Nexus.\nTo announce the release, wait until Maven central has mirrored the Apache binaries, then update the Iceberg site and send an announcement email:\n[ANNOUNCE] Apache Iceberg release <VERSION> I'm pleased to announce the release of Apache Iceberg <VERSION>! Apache Iceberg is an open table format for huge analytic datasets. Iceberg delivers high query performance for tables with tens of petabytes of data, along with atomic commits, concurrent writes, and SQL-compatible table evolution. This release can be downloaded from: https://www.apache.org/dyn/closer.cgi/iceberg/<TARBALL NAME WITHOUT .tar.gz>/<TARBALL NAME> Java artifacts are available from Maven Central. Thanks to everyone for contributing! Documentation Release Documentation needs to be updated as a part of an Iceberg release after a release candidate is passed. The commands described below assume you are in a directory containing a local clone of the iceberg-docs repository and iceberg repository. Adjust the commands accordingly if it is not the case. Note that all changes in iceberg need to happen against the master branch and changes in iceberg-docs need to happen against the main branch.\niceberg repository preparations A PR needs to be published in the iceberg repository with the following changes:\nCreate a new folder called docs/releases/<VERSION NUMBER> with an _index.md file. See the existing folders under docs/releases for more details. Common documentation update To start the release process, run the following steps in the iceberg-docs repository to copy docs over: cp -r ../iceberg/format/* ../iceberg-docs/landing-page/content/common/ Change into the iceberg-docs repository and create a branch. cd ../iceberg-docs git checkout -b <BRANCH NAME> Commit, push, and open a PR against the iceberg-docs repo (<BRANCH NAME> -> main) Versioned documentation update Once the common docs changes have been merged into main, the next step is to update the versioned docs.\nIn the iceberg-docs repository, cut a new branch using the version number as the branch name cd ../iceberg-docs git checkout -b <VERSION> git push --set-upstream apache <VERSION> Copy the versioned docs from the iceberg repo into the iceberg-docs repo rm -rf ../iceberg-docs/docs/content cp -r ../iceberg/docs ../iceberg-docs/docs/content Commit the changes and open a PR against the <VERSION> branch in the iceberg-docs repo Javadoc update In the iceberg repository, generate the javadoc for your release and copy it to the javadoc folder in iceberg-docs repo:\ncd ../iceberg ./gradlew refreshJavadoc rm -rf ../iceberg-docs/javadoc cp -r site/docs/javadoc/<VERSION NUMBER> ../iceberg-docs/javadoc This resulted changes in iceberg-docs should be approved in a separate PR.\nUpdate the latest branch Since main is currently the same as the version branch, one needs to rebase latest branch against main:\ngit checkout latest git rebase main git push apache latest Set latest version in iceberg-docs repo The last step is to update the main branch in iceberg-docs to set the latest version. A PR needs to be published in the iceberg-docs repository with the following changes:\nUpdate variable latestVersions.iceberg to the new release version in landing-page/config.toml Update variable latestVersions.iceberg to the new release version in docs/config.toml Mark the current latest release notes to past releases under landing-page/content/common/release-notes.md Add release notes for the new release version in landing-page/content/common/release-notes.md How to Verify a Release Each Apache Iceberg release is validated by the community by holding a vote. A community release manager will prepare a release candidate and call a vote on the Iceberg dev list. To validate the release candidate, community members will test it out in their downstream projects and environments. It\u2019s recommended to report the Java, Scala, Spark, Flink and Hive versions you have tested against when you vote.\nIn addition to testing in downstream projects, community members also check the release\u2019s signatures, checksums, and license documentation.\nValidating a source release candidate Release announcements include links to the following:\nA source tarball A signature (.asc) A checksum (.sha512) KEYS file GitHub change comparison After downloading the source tarball, signature, checksum, and KEYS file, here are instructions on how to verify signatures, checksums, and documentation.\nVerifying Signatures First, import the keys.\ncurl https://dist.apache.org/repos/dist/dev/iceberg/KEYS -o KEYS gpg --import KEYS Next, verify the .asc file.\ngpg --verify apache-iceberg-0.14.0.tar.gz.asc Verifying Checksums shasum -a 512 --check apache-iceberg-0.14.0.tar.gz.sha512 Verifying License Documentation Untar the archive and change into the source directory.\ntar xzf apache-iceberg-0.14.0.tar.gz cd apache-iceberg-0.14.0 Run RAT checks to validate license headers.\ndev/check-license Verifying Build and Test To verify that the release candidate builds properly, run the following command.\n./gradlew build Testing release binaries Release announcements will also include a maven repository location. You can use this location to test downstream dependencies by adding it to your maven or gradle build.\nTo use the release in your maven build, add the following to your POM or settings.xml:\n... <repositories> <repository> <id>iceberg-release-candidate</id> <name>Iceberg Release Candidate</name> <url>${MAVEN_URL}</url> </repository> </repositories> ... To use the release in your gradle build, add the following to your build.gradle:\nrepositories { mavenCentral() maven { url \"${MAVEN_URL}\" } } !!! Note Replace ${MAVEN_URL} with the URL provided in the release announcement\nVerifying with Spark To verify using spark, start a spark-shell with a command like the following command (use the appropriate spark-runtime jar for the Spark installation):\nspark-shell \\ --conf spark.jars.repositories=${MAVEN_URL} \\ --packages org.apache.iceberg:iceberg-spark3-runtime:0.14.0 \\ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.local.type=hadoop \\ --conf spark.sql.catalog.local.warehouse=${LOCAL_WAREHOUSE_PATH} \\ --conf spark.sql.catalog.local.default-namespace=default \\ --conf spark.sql.defaultCatalog=local Verifying with Flink To verify using Flink, start a Flink SQL Client with the following command:\nwget ${MAVEN_URL}/iceberg-flink-runtime/0.14.0/iceberg-flink-runtime-0.14.0.jar sql-client.sh embedded \\ -j iceberg-flink-runtime-0.14.0.jar \\ -j ${FLINK_CONNECTOR_PACKAGE}-${HIVE_VERSION}_${SCALA_VERSION}-${FLINK_VERSION}.jar \\ shell Voting Votes are cast by replying to the release candidate announcement email on the dev mailing list with either +1, 0, or -1.\n[ ] +1 Release this as Apache Iceberg 0.14.0 [ ] +0 [ ] -1 Do not release this because\u2026\nIn addition to your vote, it\u2019s customary to specify if your vote is binding or non-binding. Only members of the Project Management Committee have formally binding votes. If you\u2019re unsure, you can specify that your vote is non-binding. To read more about voting in the Apache framework, checkout the Voting information page on the Apache foundation\u2019s website.\n", "description": "", "title": "How To Release", "uri": "/how-to-release/"}, {"categories": null, "content": " Multi-Engine Support Apache Iceberg is an open standard for huge analytic tables that can be used by any processing engine. The community continuously improves Iceberg core library components to enable integrations with different compute engines that power analytics, business intelligence, machine learning, etc. Connectors for Spark, Flink and Hive are maintained in the main Iceberg repository.\nMulti-Version Support Processing engine connectors maintained in the iceberg repository are built for multiple versions.\nFor Spark and Flink, each new version that introduces backwards incompatible upgrade has its dedicated integration codebase and release artifacts. For example, the code for Iceberg Spark 3.1 integration is under /spark/v3.1 and the code for Iceberg Spark 3.2 integration is under /spark/v3.2. Different artifacts (iceberg-spark-3.1_2.12 and iceberg-spark-3.2_2.12) are released for users to consume. By doing this, changes across versions are isolated. New features in Iceberg could be developed against the latest features of an engine without breaking support of old APIs in past engine versions.\nFor Hive, Hive 2 uses the iceberg-mr package for Iceberg integration, and Hive 3 requires an additional dependency of the iceberg-hive3 package.\nRuntime Jar Iceberg provides a runtime connector jar for each supported version of Spark, Flink and Hive. When using Iceberg with these engines, the runtime jar is the only addition to the classpath needed in addition to vendor dependencies. For example, to use Iceberg with Spark 3.2 and AWS integrations, iceberg-spark-runtime-3.2_2.12 and AWS SDK dependencies are needed for the Spark installation.\nSpark and Flink provide different runtime jars for each supported engine version. Hive 2 and Hive 3 currently share the same runtime jar. The runtime jar names and latest version download links are listed in the tables below.\nEngine Version Lifecycle Each engine version undergoes the following lifecycle stages:\nBeta: a new engine version is supported, but still in the experimental stage. Maybe the engine version itself is still in preview (e.g. Spark 3.0.0-preview), or the engine does not yet have full feature compatibility compared to old versions yet. This stage allows Iceberg to release an engine version support without the need to wait for feature parity, shortening the release time. Maintained: an engine version is actively maintained by the community. Users can expect parity for most features across all the maintained versions. If a feature has to leverage some new engine functionalities that older versions don\u2019t have, then feature parity across maintained versions is not guaranteed. Deprecated: an engine version is no longer actively maintained. People who are still interested in the version can backport any necessary feature or bug fix from newer versions, but the community will not spend effort in achieving feature parity. Iceberg recommends users to move towards a newer version. Contributions to a deprecated version is expected to diminish over time, so that eventually no change is added to a deprecated version. End-of-life: a vote can be initiated in the community to fully remove a deprecated version out of the Iceberg repository to mark as its end of life. Current Engine Version Lifecycle Status Apache Spark Version Lifecycle Stage Initial Iceberg Support Latest Iceberg Support Latest Runtime Jar 2.4 Deprecated 0.7.0-incubating 0.14.0 iceberg-spark-runtime 3.0 Maintained 0.9.0 0.14.0 iceberg-spark3-runtime [1] 3.1 Maintained 0.12.0 0.14.0 iceberg-spark-runtime-3.1_2.12 [2] 3.2 Maintained 0.13.0 0.14.0 iceberg-spark-runtime-3.2_2.12 [1] Spark 2.4 and 3.0 jar names do not follow the naming convention of newer versions for backwards compatibility [2] Spark 3.1 shares the same runtime jar iceberg-spark3-runtime with Spark 3.0 before Iceberg 0.13.0 Apache Flink Based on the guideline of the Flink community, only the latest 2 minor versions are actively maintained. Users should continuously upgrade their Flink version to stay up-to-date.\nVersion Lifecycle Stage Initial Iceberg Support Latest Iceberg Support Latest Runtime Jar 1.11 End of Life 0.9.0 0.12.1 iceberg-flink-runtime 1.12 End of Life 0.12.0 0.13.1 iceberg-flink-runtime-1.12 [3] 1.13 Deprecated 0.13.0 0.14.0 iceberg-flink-runtime-1.13 1.14 Maintained 0.13.0 0.14.0 iceberg-flink-runtime-1.14 [3] Flink 1.12 shares the same runtime jar iceberg-flink-runtime with Flink 1.11 before Iceberg 0.13.0 Apache Hive Version Recommended minor version Lifecycle Stage Initial Iceberg Support Latest Iceberg Support Latest Runtime Jar 2 2.3.8 Maintained 0.8.0-incubating 0.14.0 iceberg-hive-runtime 3 3.1.2 Maintained 0.10.0 0.14.0 iceberg-hive-runtime Developer Guide Maintaining existing engine versions Iceberg recommends the following for developers who are maintaining existing engine versions:\nNew features should always be prioritized first in the latest version, which is either a maintained or beta version. For features that could be backported, contributors are encouraged to either perform backports to all maintained versions, or at least create some issues to track the backport. If the change is small enough, updating all versions in a single PR is acceptable. Otherwise, using separated PRs for each version is recommended. Supporting new engines Iceberg recommends new engines to build support by importing the Iceberg libraries to the engine\u2019s project. This allows the Iceberg support to evolve with the engine. Projects such as Trino and Presto are good examples of such support strategy.\nIn this approach, an Iceberg version upgrade is needed for an engine to consume new Iceberg features. To facilitate engine development against unreleased Iceberg features, a daily snapshot is published in the Apache snapshot repository.\nIf bringing an engine directly to the Iceberg main repository is needed, please raise a discussion thread in the Iceberg community.\n", "description": "", "title": "Multi-Engine Support", "uri": "/multi-engine-support/"}, {"categories": null, "content": " Puffin file format This is a specification for Puffin, a file format designed to store information such as indexes and statistics about data managed in an Iceberg table that cannot be stored directly within the Iceberg manifest. A Puffin file contains arbitrary pieces of information (here called \u201cblobs\u201d), along with metadata necessary to interpret them. The blobs supported by Iceberg are documented at Blob types.\nFormat specification A file conforming to the Puffin file format specification should have the structure as described below.\nVersions Currently, there is a single version of the Puffin file format, described below.\nFile structure The Puffin file has the following structure\nMagic Blob\u2081 Blob\u2082 ... Blob\u2099 Footer where\nMagic is four bytes 0x50, 0x46, 0x41, 0x31 (short for: Puffin Fratercula arctica, version 1), Blob\u1d62 is i-th blob contained in the file, to be interpreted by application according to the footer, Footer is defined below. Footer structure Footer has the following structure\nMagic FooterPayload FooterPayloadSize Flags Magic where\nMagic: four bytes, same as at the beginning of the file FooterPayload: optionally compressed, UTF-8 encoded JSON payload describing the blobs in the file, with the structure described below FooterPayloadSize: a length in bytes of the FooterPayload (after compression, if compressed), stored as 4 byte integer Flags: 4 bytes for boolean flags byte 0 (first) bit 0 (lowest bit): whether FooterPayload is compressed all other bits are reserved for future use and should be set to 0 on write all other bytes are reserved for future use and should be set to 0 on write A 4 byte integer is always signed, in a two\u2019s complement representation, stored little-endian.\nFooter Payload Footer payload bytes is either uncompressed or LZ4-compressed (as a single LZ4 compression frame with content size present), UTF-8 encoded JSON payload representing a single FileMetadata object.\nFileMetadata FileMetadata has the following fields\nField Name Field Type Required Description blobs list of BlobMetadata objects yes properties JSON object with string property values no storage for arbitrary meta-information, like writer identification/version. See Common properties for properties that are recommended to be set by a writer. BlobMetadata BlobMetadata has the following fields\nField Name Field Type Required Description type JSON string yes See Blob types fields JSON list of ints yes List of field IDs the blob was computed for; the order of items is used to compute sketches stored in the blob. snapshot-id JSON long yes ID of the Iceberg table\u2019s snapshot the blob was computed from. sequence-number JSON long yes Sequence number of the Iceberg table\u2019s snapshot the blob was computed from. offset JSON long yes The offset in the file where the blob contents start length JSON long yes The length of the blob stored in the file (after compression, if compressed) compression-codec JSON string no See Compression codecs. If omitted, the data is assumed to be uncompressed. properties JSON object with string property values no storage for arbitrary meta-information about the blob Blob types The blobs can be of a type listed below\napache-datasketches-theta-v1 blob type A serialized form of a \u201ccompact\u201d Theta sketch produced by the Apache DataSketches library. The sketch is obtained by constructing Alpha family sketch with default seed, and feeding it with individual distinct values converted to bytes using Iceberg\u2019s single-value serialization.\nThe blob metadata for this blob may include following properties:\nndv: estimate of number of distinct values, derived from the sketch. Compression codecs The data can also be uncompressed. If it is compressed the codec should be one of codecs listed below. For maximal interoperability, other codecs are not supported.\nCodec name Description lz4 Single LZ4 compression frame, with content size present zstd Single Zstandard compression frame, with content size present __ Common properties When writing a Puffin file it is recommended to set the following fields in the FileMetadata\u2019s properties field.\ncreated-by - human-readable identification of the application writing the file, along with its version. Example \u201cTrino version 381\u201d. ", "description": "", "title": "Puffin Spec", "uri": "/puffin-spec/"}, {"categories": null, "content": " Roadmap Overview This roadmap outlines projects that the Iceberg community is working on, their priority, and a rough size estimate. This is based on the latest community priority discussion. Each high-level item links to a Github project board that tracks the current status. Related design docs will be linked on the planning boards.\nPriority 1 API: Iceberg 1.0.0 [medium] Python: Pythonic refactor [medium] Spec: Z-ordering / Space-filling curves [medium] Spec: Snapshot tagging and branching [small] Views: Spec [medium] Puffin: Implement statistics information in table snapshot [medium] Flink: FLIP-27 based Iceberg source [large] Priority 2 ORC: Support delete files stored as ORC [small] Spark: DSv2 streaming improvements [small] Flink: Inline file compaction [small] Flink: Support UPSERT [small] Spec: Secondary indexes [large] Spec v3: Encryption [large] Spec v3: Relative paths [large] Spec v3: Default field values [medium] ", "description": "", "title": "Roadmap", "uri": "/roadmap/"}, {"categories": null, "content": " Reporting Security Issues The Apache Iceberg Project uses the standard process outlined by the Apache Security Team for reporting vulnerabilities. Note that vulnerabilities should not be publicly disclosed until the project has responded.\nTo report a possible security vulnerability, please email security@iceberg.apache.org.\nVerifying Signed Releases Please refer to the instructions on the Release Verification page.\n", "description": "", "title": "Security", "uri": "/security/"}, {"categories": null, "content": " Iceberg Table Spec This is a specification for the Iceberg table format that is designed to manage a large, slow-changing collection of files in a distributed file system or key-value store as a table.\nFormat Versioning Versions 1 and 2 of the Iceberg spec are complete and adopted by the community.\nThe format version number is incremented when new features are added that will break forward-compatibility\u2014that is, when older readers would not read newer table features correctly. Tables may continue to be written with an older version of the spec to ensure compatibility by not using features that are not yet implemented by processing engines.\nVersion 1: Analytic Data Tables Version 1 of the Iceberg spec defines how to manage large analytic tables using immutable file formats: Parquet, Avro, and ORC.\nAll version 1 data and metadata files are valid after upgrading a table to version 2. Appendix E documents how to default version 2 fields when reading version 1 metadata.\nVersion 2: Row-level Deletes Version 2 of the Iceberg spec adds row-level updates and deletes for analytic tables with immutable files.\nThe primary change in version 2 adds delete files to encode that rows that are deleted in existing data files. This version can be used to delete or replace individual rows in immutable data files without rewriting the files.\nIn addition to row-level deletes, version 2 makes some requirements stricter for writers. The full set of changes are listed in Appendix E.\nGoals Serializable isolation \u2013 Reads will be isolated from concurrent writes and always use a committed snapshot of a table\u2019s data. Writes will support removing and adding files in a single operation and are never partially visible. Readers will not acquire locks. Speed \u2013 Operations will use O(1) remote calls to plan the files for a scan and not O(n) where n grows with the size of the table, like the number of partitions or files. Scale \u2013 Job planning will be handled primarily by clients and not bottleneck on a central metadata store. Metadata will include information needed for cost-based optimization. Evolution \u2013 Tables will support full schema and partition spec evolution. Schema evolution supports safe column add, drop, reorder and rename, including in nested structures. Dependable types \u2013 Tables will provide well-defined and dependable support for a core set of types. Storage separation \u2013 Partitioning will be table configuration. Reads will be planned using predicates on data values, not partition values. Tables will support evolving partition schemes. Formats \u2013 Underlying data file formats will support identical schema evolution rules and types. Both read-optimized and write-optimized formats will be available. Overview This table format tracks individual data files in a table instead of directories. This allows writers to create data files in-place and only adds files to the table in an explicit commit.\nTable state is maintained in metadata files. All changes to table state create a new metadata file and replace the old metadata with an atomic swap. The table metadata file tracks the table schema, partitioning config, custom properties, and snapshots of the table contents. A snapshot represents the state of a table at some time and is used to access the complete set of data files in the table.\nData files in snapshots are tracked by one or more manifest files that contain a row for each data file in the table, the file\u2019s partition data, and its metrics. The data in a snapshot is the union of all files in its manifests. Manifest files are reused across snapshots to avoid rewriting metadata that is slow-changing. Manifests can track data files with any subset of a table and are not associated with partitions.\nThe manifests that make up a snapshot are stored in a manifest list file. Each manifest list stores metadata about manifests, including partition stats and data file counts. These stats are used to avoid reading manifests that are not required for an operation.\nOptimistic Concurrency An atomic swap of one table metadata file for another provides the basis for serializable isolation. Readers use the snapshot that was current when they load the table metadata and are not affected by changes until they refresh and pick up a new metadata location.\nWriters create table metadata files optimistically, assuming that the current version will not be changed before the writer\u2019s commit. Once a writer has created an update, it commits by swapping the table\u2019s metadata file pointer from the base version to the new version.\nIf the snapshot on which an update is based is no longer current, the writer must retry the update based on the new current version. Some operations support retry by re-applying metadata changes and committing, under well-defined conditions. For example, a change that rewrites files can be applied to a new table snapshot if all of the rewritten files are still in the table.\nThe conditions required by a write to successfully commit determines the isolation level. Writers can select what to validate and can make different isolation guarantees.\nSequence Numbers The relative age of data and delete files relies on a sequence number that is assigned to every successful commit. When a snapshot is created for a commit, it is optimistically assigned the next sequence number, and it is written into the snapshot\u2019s metadata. If the commit fails and must be retried, the sequence number is reassigned and written into new snapshot metadata.\nAll manifests, data files, and delete files created for a snapshot inherit the snapshot\u2019s sequence number. Manifest file metadata in the manifest list stores a manifest\u2019s sequence number. New data and metadata file entries are written with null in place of a sequence number, which is replaced with the manifest\u2019s sequence number at read time. When a data or delete file is written to a new manifest (as \u201cexisting\u201d), the inherited sequence number is written to ensure it does not change after it is first inherited.\nInheriting the sequence number from manifest metadata allows writing a new manifest once and reusing it in commit retries. To change a sequence number for a retry, only the manifest list must be rewritten \u2013 which would be rewritten anyway with the latest set of manifests.\nRow-level Deletes Row-level deletes are stored in delete files.\nThere are two ways to encode a row-level delete:\nPosition deletes mark a row deleted by data file path and the row position in the data file Equality deletes mark a row deleted by one or more column values, like id = 5 Like data files, delete files are tracked by partition. In general, a delete file must be applied to older data files with the same partition; see Scan Planning for details. Column metrics can be used to determine whether a delete file\u2019s rows overlap the contents of a data file or a scan range.\nFile System Operations Iceberg only requires that file systems support the following operations:\nIn-place write \u2013 Files are not moved or altered once they are written. Seekable reads \u2013 Data file formats require seek support. Deletes \u2013 Tables delete files that are no longer used. These requirements are compatible with object stores, like S3.\nTables do not require random-access writes. Once written, data and metadata files are immutable until they are deleted.\nTables do not require rename, except for tables that use atomic rename to implement the commit operation for new metadata files.\nSpecification Terms Schema \u2013 Names and types of fields in a table. Partition spec \u2013 A definition of how partition values are derived from data fields. Snapshot \u2013 The state of a table at some point in time, including the set of all data files. Manifest list \u2013 A file that lists manifest files; one per snapshot. Manifest \u2013 A file that lists data or delete files; a subset of a snapshot. Data file \u2013 A file that contains rows of a table. Delete file \u2013 A file that encodes rows of a table that are deleted by position or data values. Writer requirements Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata files to a table with the given version.\nRequirement Write behavior (blank) The field should be omitted optional The field can be written required The field must be written Readers should be more permissive because v1 metadata files are allowed in v2 tables so that tables can be upgraded to v2 without rewriting the metadata tree. For manifest list and manifest files, this table shows the expected v2 read behavior:\nv1 v2 v2 read behavior optional Read the field as optional required Read the field as optional; it may be missing in v1 files optional Ignore the field optional optional Read the field as optional optional required Read the field as optional; it may be missing in v1 files required Ignore the field required optional Read the field as optional required required Fill in a default or throw an exception if the field is missing Readers may be more strict for metadata JSON files because the JSON files are not reused and will always match the table version. Required v2 fields that were not present in v1 or optional in v1 may be handled as required fields. For example, a v2 table that is missing last-sequence-number can throw an exception.\nSchemas and Data Types A table\u2019s schema is a list of named columns. All data types are either primitives or nested types, which are maps, lists, or structs. A table schema is also a struct type.\nFor the representations of these types in Avro, ORC, and Parquet file formats, see Appendix A.\nNested Types A struct is a tuple of typed values. Each field in the tuple is named and has an integer id that is unique in the table schema. Each field can be either optional or required, meaning that values can (or cannot) be null. Fields may be any type. Fields may have an optional comment or doc string. Fields can have default values.\nA list is a collection of values with some element type. The element field has an integer id that is unique in the table schema. Elements can be either optional or required. Element types may be any type.\nA map is a collection of key-value pairs with a key type and a value type. Both the key field and value field each have an integer id that is unique in the table schema. Map keys are required and map values can be either optional or required. Both map keys and map values may be any type, including nested types.\nPrimitive Types Primitive type Description Requirements boolean True or false int 32-bit signed integers Can promote to long long 64-bit signed integers float 32-bit IEEE 754 floating point Can promote to double double 64-bit IEEE 754 floating point decimal(P,S) Fixed-point decimal; precision P, scale S Scale is fixed [1], precision must be 38 or less date Calendar date without timezone or time time Time of day without date, timezone Microsecond precision [2] timestamp Timestamp without timezone Microsecond precision [2] timestamptz Timestamp with timezone Stored as UTC [2] string Arbitrary-length character sequences Encoded with UTF-8 [3] uuid Universally unique identifiers Should use 16-byte fixed fixed(L) Fixed-length byte array of length L binary Arbitrary-length byte array Notes:\nDecimal scale is fixed and cannot be changed by schema evolution. Precision can only be widened. All time and timestamp values are stored with microsecond precision. Timestamps with time zone represent a point in time: values are stored as UTC and do not retain a source time zone (2017-11-16 17:10:34 PST is stored/retrieved as 2017-11-17 01:10:34 UTC and these values are considered identical). Timestamps without time zone represent a date and time of day regardless of zone: the time value is independent of zone adjustments (2017-11-16 17:10:34 is always retrieved as 2017-11-16 17:10:34). Timestamp values are stored as a long that encodes microseconds from the unix epoch. Character strings must be stored as UTF-8 encoded byte arrays. For details on how to serialize a schema to JSON, see Appendix C.\nDefault values Default values can be tracked for struct fields (both nested structs and the top-level schema\u2019s struct). There can be two defaults with a field:\ninitial-default is used to populate the field\u2019s value for all records that were written before the field was added to the schema write-default is used to populate the field\u2019s value for any records written after the field was added to the schema, if the writer does not supply the field\u2019s value The initial-default is set only when a field is added to an existing schema. The write-default is initially set to the same value as initial-default and can be changed through schema evolution. If either default is not set for an optional field, then the default value is null for compatibility with older spec versions.\nThe initial-default and write-default produce SQL default value behavior, without rewriting data files. SQL default value behavior when a field is added handles all existing rows as though the rows were written with the new field\u2019s default value. Default value changes may only affect future records and all known fields are written into data files. Omitting a known field when writing a data file is never allowed. The write default for a field must be written if a field is not supplied to a write. If the write default for a required field is not set, the writer must fail.\nDefault values are attributes of fields in schemas and serialized with fields in the JSON format. See Appendix C.\nSchema Evolution Schemas may be evolved by type promotion or adding, deleting, renaming, or reordering fields in structs (both nested structs and the top-level schema\u2019s struct).\nEvolution applies changes to the table\u2019s current schema to produce a new schema that is identified by a unique schema ID, is added to the table\u2019s list of schemas, and is set as the table\u2019s current schema.\nValid type promotions are:\nint to long float to double decimal(P, S) to decimal(P', S) if P' > P \u2013 widen the precision of decimal types. Any struct, including a top-level schema, can evolve through deleting fields, adding new fields, renaming existing fields, reordering existing fields, or promoting a primitive using the valid type promotions. Adding a new field assigns a new ID for that field and for any nested fields. Renaming an existing field must change the name, but not the field ID. Deleting a field removes it from the current schema. Field deletion cannot be rolled back unless the field was nullable or if the current snapshot has not changed.\nGrouping a subset of a struct\u2019s fields into a nested struct is not allowed, nor is moving fields from a nested struct into its immediate parent struct (struct<a, b, c> \u2194 struct<a, struct<b, c>>). Evolving primitive types to structs is not allowed, nor is evolving a single-field struct to a primitive (map<string, int> \u2194 map<string, struct<int>>).\nStruct evolution requires the following rules for default values:\nThe initial-default must be set when a field is added and cannot change The write-default must be set when a field is added and may change When a required field is added, both defaults must be set to a non-null value When an optional field is added, the defaults may be null and should be explicitly set When a new field is added to a struct with a default value, updating the struct\u2019s default is optional If a field value is missing from a struct\u2019s initial-default, the field\u2019s initial-default must be used for the field If a field value is missing from a struct\u2019s write-default, the field\u2019s write-default must be used for the field Column Projection Columns in Iceberg data files are selected by field id. The table schema\u2019s column names and order may change after a data file is written, and projection must be done using field ids. If a field id is missing from a data file, its value for each row should be null.\nFor example, a file may be written with schema 1: a int, 2: b string, 3: c double and read using projection schema 3: measurement, 2: name, 4: a. This must select file columns c (renamed to measurement), b (now called name), and a column of null values called a; in that order.\nTables may also define a property schema.name-mapping.default with a JSON name mapping containing a list of field mapping objects. These mappings provide fallback field ids to be used when a data file does not contain field id information. Each object should contain\nnames: A required list of 0 or more names for a field. field-id: An optional Iceberg field ID used when a field\u2019s name is present in names fields: An optional list of field mappings for child field of structs, maps, and lists. Field mapping fields are constrained by the following rules:\nA name may contain . but this refers to a literal name, not a nested field. For example, a.b refers to a field named a.b, not child field b of field a. Each child field should be defined with their own field mapping under fields. Multiple values for names may be mapped to a single field ID to support cases where a field may have different names in different data files. For example, all Avro field aliases should be listed in names. Fields which exist only in the Iceberg schema and not in imported data files may use an empty names list. Fields that exist in imported files but not in the Iceberg schema may omit field-id. List types should contain a mapping in fields for element. Map types should contain mappings in fields for key and value. Struct types should contain mappings in fields for their child fields. For details on serialization, see Appendix C.\nIdentifier Field IDs A schema can optionally track the set of primitive fields that identify rows in a table, using the property identifier-field-ids (see JSON encoding in Appendix C).\nTwo rows are the \u201csame\u201d\u2014that is, the rows represent the same entity\u2014if the identifier fields are equal. However, uniqueness of rows by this identifier is not guaranteed or required by Iceberg and it is the responsibility of processing engines or data providers to enforce.\nIdentifier fields may be nested in structs but cannot be nested within maps or lists. Float, double, and optional fields cannot be used as identifier fields and a nested field cannot be used as an identifier field if it is nested in an optional struct, to avoid null values in identifiers.\nReserved Field IDs Iceberg tables must not use field ids greater than 2147483447 (Integer.MAX_VALUE - 200). This id range is reserved for metadata columns that can be used in user data schemas, like the _file column that holds the file path in which a row was stored.\nThe set of metadata columns is:\nField id, name Type Description 2147483646 _file string Path of the file in which a row is stored 2147483645 _pos long Ordinal position of a row in the source data file 2147483644 _deleted boolean Whether the row has been deleted 2147483643 _spec_id int Spec ID used to track the file containing a row 2147483642 _partition struct Partition to which a row belongs 2147483546 file_path string Path of a file, used in position-based delete files 2147483545 pos long Ordinal position of a row, used in position-based delete files 2147483544 row struct<...> Deleted row values, used in position-based delete files Partitioning Data files are stored in manifests with a tuple of partition values that are used in scans to filter out files that cannot contain records that match the scan\u2019s filter predicate. Partition values for a data file must be the same for all records stored in the data file. (Manifests store data files from any partition, as long as the partition spec is the same for the data files.)\nTables are configured with a partition spec that defines how to produce a tuple of partition values from a record. A partition spec has a list of fields that consist of:\nA source column id from the table\u2019s schema A partition field id that is used to identify a partition field and is unique within a partition spec. In v2 table metadata, it is unique across all partition specs. A transform that is applied to the source column to produce a partition value A partition name The source column, selected by id, must be a primitive type and cannot be contained in a map or list, but may be nested in a struct. For details on how to serialize a partition spec to JSON, see Appendix C.\nPartition specs capture the transform from table data to partition values. This is used to transform predicates to partition predicates, in addition to transforming data values. Deriving partition predicates from column predicates on the table data is used to separate the logical queries from physical storage: the partitioning can change and the correct partition filters are always derived from column predicates. This simplifies queries because users don\u2019t have to supply both logical predicates and partition predicates. For more information, see Scan Planning below.\nPartition Transforms Transform name Description Source types Result type identity Source value, unmodified Any Source type bucket[N] Hash of value, mod N (see below) int, long, decimal, date, time, timestamp, timestamptz, string, uuid, fixed, binary int truncate[W] Value truncated to width W (see below) int, long, decimal, string Source type year Extract a date or timestamp year, as years from 1970 date, timestamp, timestamptz int month Extract a date or timestamp month, as months from 1970-01-01 date, timestamp, timestamptz int day Extract a date or timestamp day, as days from 1970-01-01 date, timestamp, timestamptz date hour Extract a timestamp hour, as hours from 1970-01-01 00:00:00 timestamp, timestamptz int void Always produces null Any Source type or int All transforms must return null for a null input value.\nThe void transform may be used to replace the transform in an existing partition field so that the field is effectively dropped in v1 tables. See partition evolution below.\nBucket Transform Details Bucket partition transforms use a 32-bit hash of the source value. The 32-bit hash implementation is the 32-bit Murmur3 hash, x86 variant, seeded with 0.\nTransforms are parameterized by a number of buckets [1], N. The hash mod N must produce a positive value by first discarding the sign bit of the hash value. In pseudo-code, the function is:\ndef bucket_N(x) = (murmur3_x86_32_hash(x) & Integer.MAX_VALUE) % N Notes:\nChanging the number of buckets as a table grows is possible by evolving the partition spec. For hash function details by type, see Appendix B.\nTruncate Transform Details Type Config Truncate specification Examples int W, width v - (v % W)\tremainders must be positive\t[1] W=10: 1 \uffeb 0, -1 \uffeb -10 long W, width v - (v % W)\tremainders must be positive\t[1] W=10: 1 \uffeb 0, -1 \uffeb -10 decimal W, width (no scale) scaled_W = decimal(W, scale(v)) v - (v % scaled_W)\t[1, 2] W=50, s=2: 10.65 \uffeb 10.50 string L, length Substring of length L: v.substring(0, L) [3] L=3: iceberg \uffeb ice Notes:\nThe remainder, v % W, must be positive. For languages where % can produce negative values, the correct truncate function is: v - (((v % W) + W) % W) The width, W, used to truncate decimal values is applied using the scale of the decimal column to avoid additional (and potentially conflicting) parameters. Strings are truncated to a valid UTF-8 string with no more than L code points. Partition Evolution Table partitioning can be evolved by adding, removing, renaming, or reordering partition spec fields.\nChanging a partition spec produces a new spec identified by a unique spec ID that is added to the table\u2019s list of partition specs and may be set as the table\u2019s default spec.\nWhen evolving a spec, changes should not cause partition field IDs to change because the partition field IDs are used as the partition tuple field IDs in manifest files.\nIn v2, partition field IDs must be explicitly tracked for each partition field. New IDs are assigned based on the last assigned partition ID in table metadata.\nIn v1, partition field IDs were not tracked, but were assigned sequentially starting at 1000 in the reference implementation. This assignment caused problems when reading metadata tables based on manifest files from multiple specs because partition fields with the same ID may contain different data types. For compatibility with old versions, the following rules are recommended for partition evolution in v1 tables:\nDo not reorder partition fields Do not drop partition fields; instead replace the field\u2019s transform with the void transform Only add partition fields at the end of the previous partition spec Sorting Users can sort their data within partitions by columns to gain performance. The information on how the data is sorted can be declared per data or delete file, by a sort order.\nA sort order is defined by a sort order id and a list of sort fields. The order of the sort fields within the list defines the order in which the sort is applied to the data. Each sort field consists of:\nA source column id from the table\u2019s schema A transform that is used to produce values to be sorted on from the source column. This is the same transform as described in partition transforms. A sort direction, that can only be either asc or desc A null order that describes the order of null values when sorted. Can only be either nulls-first or nulls-last Order id 0 is reserved for the unsorted order.\nSorting floating-point numbers should produce the following behavior: -NaN < -Infinity < -value < -0 < 0 < value < Infinity < NaN. This aligns with the implementation of Java floating-point types comparisons.\nA data or delete file is associated with a sort order by the sort order\u2019s id within a manifest. Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes.\nManifests A manifest is an immutable Avro file that lists data files or delete files, along with each file\u2019s partition data tuple, metrics, and tracking information. One or more manifest files are used to store a snapshot, which tracks all of the files in a table at some point in time. Manifests are tracked by a manifest list for each table snapshot.\nA manifest is a valid Iceberg data file: files must use valid Iceberg formats, schemas, and column projection.\nA manifest may store either data files or delete files, but not both because manifests that contain delete files are scanned first during job planning. Whether a manifest is a data manifest or a delete manifest is stored in manifest metadata.\nA manifest stores files for a single partition spec. When a table\u2019s partition spec changes, old files remain in the older manifest and newer files are written to a new manifest. This is required because a manifest file\u2019s schema is based on its partition spec (see below). The partition spec of each manifest is also used to transform predicates on the table\u2019s data rows into predicates on partition values that are used during job planning to select files from a manifest.\nA manifest file must store the partition spec and other metadata as properties in the Avro file\u2019s key-value metadata:\nv1 v2 Key Value required required schema JSON representation of the table schema at the time the manifest was written optional required schema-id ID of the schema used to write the manifest as a string required required partition-spec JSON fields representation of the partition spec used to write the manifest optional required partition-spec-id ID of the partition spec used to write the manifest as a string optional required format-version Table format version number of the manifest as a string required content Type of content files tracked by the manifest: \u201cdata\u201d or \u201cdeletes\u201d The schema of a manifest file is a struct called manifest_entry with the following fields:\nv1 v2 Field id, name Type Description required required 0 status int with meaning: 0: EXISTING 1: ADDED 2: DELETED Used to track additions and deletions. Deletes are informational only and not used in scans. required optional 1 snapshot_id long Snapshot id where the file was added, or deleted if status is 2. Inherited when null. optional 3 sequence_number long Sequence number when the file was added. Inherited when null. required required 2 data_file data_file struct (see below) File path, partition tuple, metrics, \u2026 data_file is a struct with the following fields:\nv1 v2 Field id, name Type Description required 134 content int with meaning: 0: DATA, 1: POSITION DELETES, 2: EQUALITY DELETES Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files) required required 100 file_path string Full URI for the file with FS scheme required required 101 file_format string String file format name, avro, orc or parquet required required 102 partition struct<...> Partition data tuple, schema based on the partition spec output using partition field ids for the struct field ids required required 103 record_count long Number of records in this file required required 104 file_size_in_bytes long Total file size in bytes required 105 block_size_in_bytes long Deprecated. Always write a default in v1. Do not write in v2. optional 106 file_ordinal int Deprecated. Do not write. optional 107 sort_columns list<112: int> Deprecated. Do not write. optional optional 108 column_sizes map<117: int, 118: long> Map from column id to the total size on disk of all regions that store the column. Does not include bytes necessary to read other columns, like footers. Leave null for row-oriented formats (Avro) optional optional 109 value_counts map<119: int, 120: long> Map from column id to number of values in the column (including null and NaN values) optional optional 110 null_value_counts map<121: int, 122: long> Map from column id to number of null values in the column optional optional 137 nan_value_counts map<138: int, 139: long> Map from column id to number of NaN values in the column optional optional 111 distinct_counts map<123: int, 124: long> Map from column id to number of distinct values in the column; distinct counts must be derived using values in the file by counting or using sketches, but not using methods like merging existing distinct counts optional optional 125 lower_bounds map<126: int, 127: binary> Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2] optional optional 128 upper_bounds map<129: int, 130: binary> Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2] optional optional 131 key_metadata binary Implementation-specific key metadata for encryption optional optional 132 split_offsets list<133: long> Split offsets for the data file. For example, all row group offsets in a Parquet file. Must be sorted ascending optional 135 equality_ids list<136: int> Field ids used to determine row equality in equality delete files. Required when content=2 and should be null otherwise. Fields with ids listed in this column must be present in the delete file optional optional 140 sort_order_id int ID representing sort order for this file [3]. Notes:\nSingle-value serialization for lower and upper bounds is detailed in Appendix D. For float and double, the value -0.0 must precede +0.0, as in the IEEE 754 totalOrder predicate. NaNs are not permitted as lower or upper bounds. If sort order ID is missing or unknown, then the order is assumed to be unsorted. Only data files and equality delete files should be written with a non-null order id. Position deletes are required to be sorted by file and position, not a table order, and should set sort order id to null. Readers must ignore sort order id for position delete files. The following field ids are reserved on data_file: 141. The partition struct stores the tuple of partition values for each file. Its type is derived from the partition fields of the partition spec used to write the manifest file. In v2, the partition struct\u2019s field ids must match the ids from the partition spec.\nThe column metrics maps are used when filtering to select both data and delete files. For delete files, the metrics must store bounds and counts for all deleted rows, or must be omitted. Storing metrics for deleted rows ensures that the values can be used during job planning to find delete files that must be merged during a scan.\nManifest Entry Fields The manifest entry fields are used to keep track of the snapshot in which files were added or logically deleted. The data_file struct is nested inside of the manifest entry so that it can be easily passed to job planning without the manifest entry fields.\nWhen a file is added to the dataset, it\u2019s manifest entry should store the snapshot ID in which the file was added and set status to 1 (added).\nWhen a file is replaced or deleted from the dataset, it\u2019s manifest entry fields store the snapshot ID in which the file was deleted and status 2 (deleted). The file may be deleted from the file system when the snapshot in which it was deleted is garbage collected, assuming that older snapshots have also been garbage collected [1].\nIceberg v2 adds a sequence number to the entry and makes the snapshot id optional. Both fields, sequence_number and snapshot_id, are inherited from manifest metadata when null. That is, if the field is null for an entry, then the entry must inherit its value from the manifest file\u2019s metadata, stored in the manifest list [2].\nNotes:\nTechnically, data files can be deleted when the last snapshot that contains the file as \u201clive\u201d data is garbage collected. But this is harder to detect and requires finding the diff of multiple snapshots. It is easier to track what files are deleted in a snapshot and delete them when that snapshot expires. It is not recommended to add a deleted file back to a table. Adding a deleted file can lead to edge cases where incremental deletes can break table snapshots. Manifest list files are required in v2, so that the sequence_number and snapshot_id to inherit are always available. Sequence Number Inheritance Manifests track the sequence number when a data or delete file was added to the table.\nWhen adding new file, its sequence number is set to null because the snapshot\u2019s sequence number is not assigned until the snapshot is successfully committed. When reading, sequence numbers are inherited by replacing null with the manifest\u2019s sequence number from the manifest list.\nWhen writing an existing file to a new manifest, the sequence number must be non-null and set to the sequence number that was inherited.\nInheriting sequence numbers through the metadata tree allows writing a new manifest without a known sequence number, so that a manifest can be written once and reused in commit retries. To change a sequence number for a retry, only the manifest list must be rewritten.\nWhen reading v1 manifests with no sequence number column, sequence numbers for all files must default to 0.\nSnapshots A snapshot consists of the following fields:\nv1 v2 Field Description required required snapshot-id A unique long ID optional optional parent-snapshot-id The snapshot ID of the snapshot\u2019s parent. Omitted for any snapshot with no parent required sequence-number A monotonically increasing long that tracks the order of changes to a table required required timestamp-ms A timestamp when the snapshot was created, used for garbage collection and table inspection optional required manifest-list The location of a manifest list for this snapshot that tracks manifest files with additional metadata optional manifests A list of manifest file locations. Must be omitted if manifest-list is present optional required summary A string map that summarizes the snapshot changes, including operation (see below) optional optional schema-id ID of the table\u2019s current schema when the snapshot was created The snapshot summary\u2019s operation field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible operation values are:\nappend \u2013 Only data files were added and no files were removed. replace \u2013 Data and delete files were added and removed without changing table data; i.e., compaction, changing the data file format, or relocating data files. overwrite \u2013 Data and delete files were added and removed in a logical overwrite operation. delete \u2013 Data files were removed and their contents logically deleted and/or delete files were added to delete rows. Data and delete files for a snapshot can be stored in more than one manifest. This enables:\nAppends can add a new manifest to minimize the amount of data written, instead of adding new records by rewriting and appending to an existing manifest. (This is called a \u201cfast append\u201d.) Tables can use multiple partition specs. A table\u2019s partition configuration can evolve if, for example, its data volume changes. Each manifest uses a single partition spec, and queries do not need to change because partition filters are derived from data predicates. Large tables can be split across multiple manifests so that implementations can parallelize job planning or reduce the cost of rewriting a manifest. Manifests for a snapshot are tracked by a manifest list.\nValid snapshots are stored as a list in table metadata. For serialization, see Appendix C.\nManifest Lists Snapshots are embedded in table metadata, but the list of manifests for a snapshot are stored in a separate manifest list file.\nA new manifest list is written for each attempt to commit a snapshot because the list of manifests always changes to produce a new snapshot. When a manifest list is written, the (optimistic) sequence number of the snapshot is written for all new manifest files tracked by the list.\nA manifest list includes summary metadata that can be used to avoid scanning all of the manifests in a snapshot when planning a table scan. This includes the number of added, existing, and deleted files, and a summary of values for each field of the partition spec used to write the manifest.\nA manifest list is a valid Iceberg data file: files must use valid Iceberg formats, schemas, and column projection.\nManifest list files store manifest_file, a struct with the following fields:\nv1 v2 Field id, name Type Description required required 500 manifest_path string Location of the manifest file required required 501 manifest_length long Length of the manifest file in bytes required required 502 partition_spec_id int ID of a partition spec used to write the manifest; must be listed in table metadata partition-specs required 517 content int with meaning: 0: data, 1: deletes The type of files tracked by the manifest, either data or delete files; 0 for all v1 manifests required 515 sequence_number long The sequence number when the manifest was added to the table; use 0 when reading v1 manifest lists required 516 min_sequence_number long The minimum sequence number of all data or delete files in the manifest; use 0 when reading v1 manifest lists required required 503 added_snapshot_id long ID of the snapshot where the manifest file was added optional required 504 added_files_count int Number of entries in the manifest that have status ADDED (1), when null this is assumed to be non-zero optional required 505 existing_files_count int Number of entries in the manifest that have status EXISTING (0), when null this is assumed to be non-zero optional required 506 deleted_files_count int Number of entries in the manifest that have status DELETED (2), when null this is assumed to be non-zero optional required 512 added_rows_count long Number of rows in all of files in the manifest that have status ADDED, when null this is assumed to be non-zero optional required 513 existing_rows_count long Number of rows in all of files in the manifest that have status EXISTING, when null this is assumed to be non-zero optional required 514 deleted_rows_count long Number of rows in all of files in the manifest that have status DELETED, when null this is assumed to be non-zero optional optional 507 partitions list<508: field_summary> (see below) A list of field summaries for each partition field in the spec. Each field in the list corresponds to a field in the manifest file\u2019s partition spec. optional optional 519 key_metadata binary Implementation-specific key metadata for encryption field_summary is a struct with the following fields:\nv1 v2 Field id, name Type Description required required 509 contains_null boolean Whether the manifest contains at least one partition with a null value for the field optional optional 518 contains_nan boolean Whether the manifest contains at least one partition with a NaN value for the field optional optional 510 lower_bound bytes [1] Lower bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2] optional optional 511 upper_bound bytes [1] Upper bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2] Notes:\nLower and upper bounds are serialized to bytes using the single-object serialization in Appendix D. The type of used to encode the value is the type of the partition field data. If -0.0 is a value of the partition field, the lower_bound must not be +0.0, and if +0.0 is a value of the partition field, the upper_bound must not be -0.0. Scan Planning Scans are planned by reading the manifest files for the current snapshot. Deleted entries in data and delete manifests (those marked with status \u201cDELETED\u201d) are not used in a scan.\nManifests that contain no matching files, determined using either file counts or partition summaries, may be skipped.\nFor each manifest, scan predicates, which filter data rows, are converted to partition predicates, which filter data and delete files. These partition predicates are used to select the data and delete files in the manifest. This conversion uses the partition spec used to write the manifest file.\nScan predicates are converted to partition predicates using an inclusive projection: if a scan predicate matches a row, then the partition predicate must match that row\u2019s partition. This is called inclusive [1] because rows that do not match the scan predicate may be included in the scan by the partition predicate.\nFor example, an events table with a timestamp column named ts that is partitioned by ts_day=day(ts) is queried by users with ranges over the timestamp column: ts > X. The inclusive projection is ts_day >= day(X), which is used to select files that may have matching rows. Note that, in most cases, timestamps just before X will be included in the scan because the file contains rows that match the predicate and rows that do not match the predicate.\nScan predicates are also used to filter data and delete files using column bounds and counts that are stored by field id in manifests. The same filter logic can be used for both data and delete files because both store metrics of the rows either inserted or deleted. If metrics show that a delete file has no rows that match a scan predicate, it may be ignored just as a data file would be ignored [2].\nData files that match the query filter must be read by the scan.\nNote that for any snapshot, all file paths marked with \u201cADDED\u201d or \u201cEXISTING\u201d may appear at most once across all manifest files in the snapshot. If a file path appears more then once, the results of the scan are undefined. Reader implementations may raise an error in this case, but are not required to do so.\nDelete files that match the query filter must be applied to data files at read time, limited by the scope of the delete file using the following rules.\nA position delete file must be applied to a data file when all of the following are true: The data file\u2019s sequence number is less than or equal to the delete file\u2019s sequence number The data file\u2019s partition (both spec and partition values) is equal to the delete file\u2019s partition An equality delete file must be applied to a data file when all of the following are true: The data file\u2019s sequence number is strictly less than the delete\u2019s sequence number The data file\u2019s partition (both spec and partition values) is equal to the delete file\u2019s partition or the delete file\u2019s partition spec is unpartitioned In general, deletes are applied only to data files that are older and in the same partition, except for two special cases:\nEquality delete files stored with an unpartitioned spec are applied as global deletes. Otherwise, delete files do not apply to files in other partitions. Position delete files must be applied to data files from the same commit, when the data and delete file sequence numbers are equal. This allows deleting rows that were added in the same commit. Notes:\nAn alternative, strict projection, creates a partition predicate that will match a file if all of the rows in the file must match the scan predicate. These projections are used to calculate the residual predicates for each file in a scan. For example, if file_a has rows with id between 1 and 10 and a delete file contains rows with id between 1 and 4, a scan for id = 9 may ignore the delete file because none of the deletes can match a row that will be selected. Snapshot Reference Iceberg tables keep track of branches and tags using snapshot references. Tags are labels for individual snapshots. Branches are mutable named references that can be updated by committing a new snapshot as the branch\u2019s referenced snapshot using the Commit Conflict Resolution and Retry procedures.\nThe snapshot reference object records all the information of a reference including snapshot ID, reference type and Snapshot Retention Policy.\nv1 v2 Field name Type Description required required snapshot-id long A reference\u2019s snapshot ID. The tagged snapshot or latest snapshot of a branch. required required type string Type of the reference, tag or branch optional optional min-snapshots-to-keep int For branch type only, a positive number for the minimum number of snapshots to keep in a branch while expiring snapshots. Defaults to table property history.expire.min-snapshots-to-keep. optional optional max-snapshot-age-ms long For branch type only, a positive number for the max age of snapshots to keep when expiring, including the latest snapshot. Defaults to table property history.expire.max-snapshot-age-ms. optional optional max-ref-age-ms long For snapshot references except the main branch, a positive number for the max age of the snapshot reference to keep while expiring snapshots. Defaults to table property history.expire.max-ref-age-ms. The main branch never expires. Valid snapshot references are stored as the values of the refs map in table metadata. For serialization, see Appendix C.\nSnapshot Retention Policy Table snapshots expire and are removed from metadata to allow removed or replaced data files to be physically deleted. The snapshot expiration procedure removes snapshots from table metadata and applies the table\u2019s retention policy. Retention policy can be configured both globally and on snapshot reference through properties min-snapshots-to-keep, max-snapshot-age-ms and max-ref-age-ms.\nWhen expiring snapshots, retention policies in table and snapshot references are evaluated in the following way:\nStart with an empty set of snapshots to retain Remove any refs (other than main) where the referenced snapshot is older than max-ref-age-ms For each branch and tag, add the referenced snapshot to the retained set For each branch, add its ancestors to the retained set until: The snapshot is older than max-snapshot-age-ms, AND The snapshot is not one of the first min-snapshots-to-keep in the branch (including the branch\u2019s referenced snapshot) Expire any snapshot not in the set of snapshots to retain. Table Metadata Table metadata is stored as JSON. Each table metadata change creates a new table metadata file that is committed by an atomic operation. This operation is used to ensure that a new version of table metadata replaces the version on which it was based. This produces a linear history of table versions and ensures that concurrent writes are not lost.\nThe atomic operation used to commit metadata depends on how tables are tracked and is not standardized by this spec. See the sections below for examples.\nTable Metadata Fields Table metadata consists of the following fields:\nv1 v2 Field Description required required format-version An integer version number for the format. Currently, this can be 1 or 2 based on the spec. Implementations must throw an exception if a table\u2019s version is higher than the supported version. optional required table-uuid A UUID that identifies the table, generated when the table is created. Implementations must throw an exception if a table\u2019s UUID does not match the expected UUID after refreshing metadata. required required location The table\u2019s base location. This is used by writers to determine where to store data files, manifest files, and table metadata files. required last-sequence-number The table\u2019s highest assigned sequence number, a monotonically increasing long that tracks the order of snapshots in a table. required required last-updated-ms Timestamp in milliseconds from the unix epoch when the table was last updated. Each table metadata file should update this field just before writing. required required last-column-id An integer; the highest assigned column ID for the table. This is used to ensure columns are always assigned an unused ID when evolving schemas. required schema The table\u2019s current schema. (Deprecated: use schemas and current-schema-id instead) optional required schemas A list of schemas, stored as objects with schema-id. optional required current-schema-id ID of the table\u2019s current schema. required partition-spec The table\u2019s current partition spec, stored as only fields. Note that this is used by writers to partition data, but is not used when reading because reads use the specs stored in manifest files. (Deprecated: use partition-specs and default-spec-id instead) optional required partition-specs A list of partition specs, stored as full partition spec objects. optional required default-spec-id ID of the \u201ccurrent\u201d spec that writers should use by default. optional required last-partition-id An integer; the highest assigned partition field ID across all partition specs for the table. This is used to ensure partition fields are always assigned an unused ID when evolving specs. optional optional properties A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, commit.retry.num-retries is used to control the number of commit retries. optional optional current-snapshot-id long ID of the current table snapshot; must be the same as the current ID of the main branch in refs. optional optional snapshots A list of valid snapshots. Valid snapshots are snapshots for which all data files exist in the file system. A data file must not be deleted from the file system until the last snapshot in which it was listed is garbage collected. optional optional snapshot-log A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the table. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed. optional optional metadata-log A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. optional required sort-orders A list of sort orders, stored as full sort order objects. optional required default-sort-order-id Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. optional refs A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a main branch reference pointing to the current-snapshot-id even if the refs map is null. For serialization details, see Appendix C.\nCommit Conflict Resolution and Retry When two commits happen at the same time and are based on the same version, only one commit will succeed. In most cases, the failed commit can be applied to the new current version of table metadata and retried. Updates verify the conditions under which they can be applied to a new version and retry if those conditions are met.\nAppend operations have no requirements and can always be applied. Replace operations must verify that the files that will be deleted are still in the table. Examples of replace operations include format changes (replace an Avro file with a Parquet file) and compactions (several files are replaced with a single file that contains the same rows). Delete operations must verify that specific files to delete are still in the table. Delete operations based on expressions can always be applied (e.g., where timestamp < X). Table schema updates and partition spec changes must validate that the schema has not changed between the base version and the current version. File System Tables An atomic swap can be implemented using atomic rename in file systems that support it, like HDFS or most local file systems [1].\nEach version of table metadata is stored in a metadata folder under the table\u2019s base location using a file naming scheme that includes a version number, V: v<V>.metadata.json. To commit a new metadata version, V+1, the writer performs the following steps:\nRead the current table metadata version V. Create new table metadata based on version V. Write the new table metadata to a unique file: <random-uuid>.metadata.json. Rename the unique file to the well-known file for version V: v<V+1>.metadata.json. If the rename succeeds, the commit succeeded and V+1 is the table\u2019s current version If the rename fails, go back to step 1. Notes:\nThe file system table scheme is implemented in HadoopTableOperations. Metastore Tables The atomic swap needed to commit new versions of table metadata can be implemented by storing a pointer in a metastore or database that is updated with a check-and-put operation [1]. The check-and-put validates that the version of the table that a write is based on is still current and then makes the new metadata from the write the current version.\nEach version of table metadata is stored in a metadata folder under the table\u2019s base location using a naming scheme that includes a version and UUID: <V>-<random-uuid>.metadata.json. To commit a new metadata version, V+1, the writer performs the following steps:\nCreate a new table metadata file based on the current metadata. Write the new table metadata to a unique file: <V+1>-<random-uuid>.metadata.json. Request that the metastore swap the table\u2019s metadata pointer from the location of V to the location of V+1. If the swap succeeds, the commit succeeded. V was still the latest metadata version and the metadata file for V+1 is now the current metadata. If the swap fails, another writer has already created V+1. The current writer goes back to step 1. Notes:\nThe metastore table scheme is partly implemented in BaseMetastoreTableOperations. Delete Formats This section details how to encode row-level deletes in Iceberg delete files. Row-level deletes are not supported in v1.\nRow-level delete files are valid Iceberg data files: files must use valid Iceberg formats, schemas, and column projection. It is recommended that delete files are written using the table\u2019s default file format.\nRow-level delete files are tracked by manifests, like data files. A separate set of manifests is used for delete files, but the manifest schemas are identical.\nBoth position and equality deletes allow encoding deleted row values with a delete. This can be used to reconstruct a stream of changes to a table.\nPosition Delete Files Position-based delete files identify deleted rows by file and position in one or more data files, and may optionally contain the deleted row.\nA data row is deleted if there is an entry in a position delete file for the row\u2019s file and position in the data file, starting at 0.\nPosition-based delete files store file_position_delete, a struct with the following fields:\nField id, name Type Description 2147483546 file_path string Full URI of a data file with FS scheme. This must match the file_path of the target data file in a manifest entry 2147483545 pos long Ordinal position of a deleted row in the target data file identified by file_path, starting at 0 2147483544 row required struct<...> [1] Deleted row values. Omit the column when not storing deleted rows. When present in the delete file, row is required because all delete entries must include the row values. When the deleted row column is present, its schema may be any subset of the table schema and must use field ids matching the table.\nTo ensure the accuracy of statistics, all delete entries must include row values, or the column must be omitted (this is why the column type is required).\nThe rows in the delete file must be sorted by file_path then position to optimize filtering rows while scanning.\nSorting by file_path allows filter pushdown by file in columnar storage formats. Sorting by position allows filtering rows while scanning, to avoid keeping deletes in memory. Equality Delete Files Equality delete files identify deleted rows in a collection of data files by one or more column values, and may optionally contain additional columns of the deleted row.\nEquality delete files store any subset of a table\u2019s columns and use the table\u2019s field ids. The delete columns are the columns of the delete file used to match data rows. Delete columns are identified by id in the delete file metadata column equality_ids. Float and double columns cannot be used as delete columns in equality delete files.\nA data row is deleted if its values are equal to all delete columns for any row in an equality delete file that applies to the row\u2019s data file (see Scan Planning).\nEach row of the delete file produces one equality predicate that matches any row where the delete columns are equal. Multiple columns can be thought of as an AND of equality predicates. A null value in a delete column matches a row if the row\u2019s value is null, equivalent to col IS NULL.\nFor example, a table with the following data:\n1: id | 2: category | 3: name -------|-------------|--------- 1 | marsupial | Koala 2 | toy | Teddy 3 | NULL | Grizzly 4 | NULL | Polar The delete id = 3 could be written as either of the following equality delete files:\nequality_ids=[1] 1: id ------- 3 equality_ids=[1] 1: id | 2: category | 3: name -------|-------------|--------- 3 | NULL | Grizzly The delete id = 4 AND category IS NULL could be written as the following equality delete file:\nequality_ids=[1, 2] 1: id | 2: category | 3: name -------|-------------|--------- 4 | NULL | Polar If a delete column in an equality delete file is later dropped from the table, it must still be used when applying the equality deletes. If a column was added to a table and later used as a delete column in an equality delete file, the column value is read for older data files using normal projection rules (defaults to null).\nDelete File Stats Manifests hold the same statistics for delete files and data files. For delete files, the metrics describe the values that were deleted.\nAppendix A: Format-specific Requirements Avro Data Type Mappings\nValues should be stored in Avro using the Avro types and logical type annotations in the table below.\nOptional fields, array elements, and map values must be wrapped in an Avro union with null. This is the only union type allowed in Iceberg data files.\nOptional fields must always set the Avro field default value to null.\nMaps with non-string keys must use an array representation with the map logical type. The array representation or Avro\u2019s map type may be used for maps with string keys.\nType Avro type Notes boolean boolean int int long long float float double double decimal(P,S) { \"type\": \"fixed\",\n\"size\": minBytesRequired(P),\n\"logicalType\": \"decimal\",\n\"precision\": P,\n\"scale\": S } Stored as fixed using the minimum number of bytes for the given precision. date { \"type\": \"int\",\n\"logicalType\": \"date\" } Stores days from the 1970-01-01. time { \"type\": \"long\",\n\"logicalType\": \"time-micros\" } Stores microseconds from midnight. timestamp { \"type\": \"long\",\n\"logicalType\": \"timestamp-micros\",\n\"adjust-to-utc\": false } Stores microseconds from 1970-01-01 00:00:00.000000. timestamptz { \"type\": \"long\",\n\"logicalType\": \"timestamp-micros\",\n\"adjust-to-utc\": true } Stores microseconds from 1970-01-01 00:00:00.000000 UTC. string string uuid { \"type\": \"fixed\",\n\"size\": 16,\n\"logicalType\": \"uuid\" } fixed(L) { \"type\": \"fixed\",\n\"size\": L } binary bytes struct record list array map array of key-value records, or map when keys are strings (optional). Array storage must use logical type name map and must store elements that are 2-field records. The first field is a non-null key and the second field is the value. Field IDs\nIceberg struct, list, and map types identify nested types by ID. When writing data to Avro files, these IDs must be stored in the Avro schema to support ID-based column pruning.\nIDs are stored as JSON integers in the following locations:\nID Avro schema location Property Example Struct field Record field object field-id { \"type\": \"record\", ...\n\"fields\": [\n{ \"name\": \"l\",\n\"type\": [\"null\", \"long\"],\n\"default\": null,\n\"field-id\": 8 }\n] } List element Array schema object element-id { \"type\": \"array\",\n\"items\": \"int\",\n\"element-id\": 9 } String map key Map schema object key-id { \"type\": \"map\",\n\"values\": \"int\",\n\"key-id\": 10,\n\"value-id\": 11 } String map value Map schema object value-id Map key, value Key, value fields in the element record. field-id { \"type\": \"array\",\n\"logicalType\": \"map\",\n\"items\": {\n\"type\": \"record\",\n\"name\": \"k12_v13\",\n\"fields\": [\n{ \"name\": \"key\",\n\"type\": \"int\",\n\"field-id\": 12 },\n{ \"name\": \"value\",\n\"type\": \"string\",\n\"field-id\": 13 }\n] } } Note that the string map case is for maps where the key type is a string. Using Avro\u2019s map type in this case is optional. Maps with string keys may be stored as arrays.\nParquet Data Type Mappings\nValues should be stored in Parquet using the types and logical type annotations in the table below. Column IDs are required.\nLists must use the 3-level representation.\nType Parquet physical type Logical type Notes boolean boolean int int long long float float double double decimal(P,S) P <= 9: int32,\nP <= 18: int64,\nfixed otherwise DECIMAL(P,S) Fixed must use the minimum number of bytes that can store P. date int32 DATE Stores days from the 1970-01-01. time int64 TIME_MICROS with adjustToUtc=false Stores microseconds from midnight. timestamp int64 TIMESTAMP_MICROS with adjustToUtc=false Stores microseconds from 1970-01-01 00:00:00.000000. timestamptz int64 TIMESTAMP_MICROS with adjustToUtc=true Stores microseconds from 1970-01-01 00:00:00.000000 UTC. string binary UTF8 Encoding must be UTF-8. uuid fixed_len_byte_array[16] UUID fixed(L) fixed_len_byte_array[L] binary binary struct group list 3-level list LIST See Parquet docs for 3-level representation. map 3-level map MAP See Parquet docs for 3-level representation. ORC Data Type Mappings\nType ORC type ORC type attributes Notes boolean boolean int int ORC tinyint and smallint would also map to int. long long float float double double decimal(P,S) decimal date date time long iceberg.long-type=TIME Stores microseconds from midnight. timestamp timestamp [1] timestamptz timestamp_instant [1] string string ORC varchar and char would also map to string. uuid binary iceberg.binary-type=UUID fixed(L) binary iceberg.binary-type=FIXED & iceberg.length=L The length would not be checked by the ORC reader and should be checked by the adapter. binary binary struct struct list array map map Notes:\nORC\u2019s TimestampColumnVector consists of a time field (milliseconds since epoch) and a nanos field (nanoseconds within the second). Hence the milliseconds within the second are reported twice; once in the time field and again in the nanos field. The read adapter should only use milliseconds within the second from one of these fields. The write adapter should also report milliseconds within the second twice; once in the time field and again in the nanos field. ORC writer is expected to correctly consider millis information from one of the fields. More details at https://issues.apache.org/jira/browse/ORC-546 One of the interesting challenges with this is how to map Iceberg\u2019s schema evolution (id based) on to ORC\u2019s (name based). In theory, we could use Iceberg\u2019s column ids as the column and field names, but that would be inconvenient.\nThe column IDs must be stored in ORC type attributes using the key iceberg.id, and iceberg.required to store \"true\" if the Iceberg column is required, otherwise it will be optional.\nIceberg would build the desired reader schema with their schema evolution rules and pass that down to the ORC reader, which would then use its schema evolution to map that to the writer\u2019s schema. Basically, Iceberg would need to change the names of columns and fields to get the desired mapping.\nIceberg writer ORC writer Iceberg reader ORC reader struct<a (1): int, b (2): string> struct<a: int, b: string> struct<a (2): string, c (3): date> struct<b: string, c: date> struct<a (1): struct<b (2): string, c (3): date>> struct<a: struct<b:string, c:date>> struct<aa (1): struct<cc (3): date, bb (2): string>> struct<a: struct<c:date, b:string>> Appendix B: 32-bit Hash Requirements The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with 0.\nPrimitive type Hash specification Test value int hashLong(long(v))\t[1] 34 \uffeb 2017239379 long hashBytes(littleEndianBytes(v)) 34L \uffeb 2017239379 decimal(P,S) hashBytes(minBigEndian(unscaled(v)))[2] 14.20 \uffeb -500754589 date hashInt(daysFromUnixEpoch(v)) 2017-11-16 \uffeb -653330422 time hashLong(microsecsFromMidnight(v)) 22:31:08 \uffeb -662762989 timestamp hashLong(microsecsFromUnixEpoch(v)) 2017-11-16T22:31:08 \uffeb -2047944441 timestamptz hashLong(microsecsFromUnixEpoch(v)) 2017-11-16T14:31:08-08:00\uffeb -2047944441 string hashBytes(utf8Bytes(v)) iceberg \uffeb 1210000089 uuid hashBytes(uuidBytes(v))\t[3] f79c3e09-677c-4bbd-a479-3f349cb785e7 \uffeb 1488055340 fixed(L) hashBytes(v) 00 01 02 03 \uffeb -188683207 binary hashBytes(v) 00 01 02 03 \uffeb -188683207 The types below are not currently valid for bucketing, and so are not hashed. However, if that changes and a hash value is needed, the following table shall apply:\nPrimitive type Hash specification Test value boolean false: hashInt(0), true: hashInt(1) true \uffeb 1392991556 float hashDouble(double(v)) [4] 1.0F \uffeb -142385009 double hashLong(doubleToLongBits(v)) 1.0D \uffeb -142385009 Notes:\nInteger and long hash results must be identical for all integer values. This ensures that schema evolution does not change bucket partition values if integer types are promoted. Decimal values are hashed using the minimum number of bytes required to hold the unscaled value as a two\u2019s complement big-endian; this representation does not include padding bytes required for storage in a fixed-length array. Hash results are not dependent on decimal scale, which is part of the type, not the data value. UUIDs are encoded using big endian. The test UUID for the example above is: f79c3e09-677c-4bbd-a479-3f349cb785e7. This UUID encoded as a byte array is: F7 9C 3E 09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7 Float hash values are the result of hashing the float cast to double to ensure that schema evolution does not change hash values if float types are promoted. Appendix C: JSON serialization Schemas Schemas are serialized as a JSON object with the same fields as a struct in the table below, and the following additional fields:\nv1 v2 Field JSON representation Example optional required schema-id JSON int 0 optional optional identifier-field-ids JSON list of ints [1, 2] Types are serialized according to this table:\nType JSON representation Example boolean JSON string: \"boolean\" \"boolean\" int JSON string: \"int\" \"int\" long JSON string: \"long\" \"long\" float JSON string: \"float\" \"float\" double JSON string: \"double\" \"double\" date JSON string: \"date\" \"date\" time JSON string: \"time\" \"time\" timestamp without zone JSON string: \"timestamp\" \"timestamp\" timestamp with zone JSON string: \"timestamptz\" \"timestamptz\" string JSON string: \"string\" \"string\" uuid JSON string: \"uuid\" \"uuid\" fixed(L) JSON string: \"fixed[<L>]\" \"fixed[16]\" binary JSON string: \"binary\" \"binary\" decimal(P, S) JSON string: \"decimal(<P>,<S>)\" \"decimal(9,2)\",\n\"decimal(9, 2)\" struct JSON object: {\n\"type\": \"struct\",\n\"fields\": [ {\n\"id\": <field id int>,\n\"name\": <name string>,\n\"required\": <boolean>,\n\"type\": <type JSON>,\n\"doc\": <comment string>,\n\"initial-default\": <JSON encoding of default value>,\n\"write-default\": <JSON encoding of default value>\n}, ...\n] } {\n\"type\": \"struct\",\n\"fields\": [ {\n\"id\": 1,\n\"name\": \"id\",\n\"required\": true,\n\"type\": \"uuid\",\n\"initial-default\": \"0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb\",\n\"write-default\": \"ec5911be-b0a7-458c-8438-c9a3e53cffae\"\n}, {\n\"id\": 2,\n\"name\": \"data\",\n\"required\": false,\n\"type\": {\n\"type\": \"list\",\n...\n}\n} ]\n} list JSON object: {\n\"type\": \"list\",\n\"element-id\": <id int>,\n\"element-required\": <bool>\n\"element\": <type JSON>\n} {\n\"type\": \"list\",\n\"element-id\": 3,\n\"element-required\": true,\n\"element\": \"string\"\n} map JSON object: {\n\"type\": \"map\",\n\"key-id\": <key id int>,\n\"key\": <type JSON>,\n\"value-id\": <val id int>,\n\"value-required\": <bool>\n\"value\": <type JSON>\n} {\n\"type\": \"map\",\n\"key-id\": 4,\n\"key\": \"string\",\n\"value-id\": 5,\n\"value-required\": false,\n\"value\": \"double\"\n} Note that default values are serialized using the JSON single-value serialization in Appendix D.\nPartition Specs Partition specs are serialized as a JSON object with the following fields:\nField JSON representation Example spec-id JSON int 0 fields JSON list: [\n<partition field JSON>,\n...\n] [ {\n\"source-id\": 4,\n\"field-id\": 1000,\n\"name\": \"ts_day\",\n\"transform\": \"day\"\n}, {\n\"source-id\": 1,\n\"field-id\": 1001,\n\"name\": \"id_bucket\",\n\"transform\": \"bucket[16]\"\n} ] Each partition field in the fields list is stored as an object. See the table for more detail:\nTransform or Field JSON representation Example identity JSON string: \"identity\" \"identity\" bucket[N] JSON string: \"bucket[<N>]\" \"bucket[16]\" truncate[W] JSON string: \"truncate[<W>]\" \"truncate[20]\" year JSON string: \"year\" \"year\" month JSON string: \"month\" \"month\" day JSON string: \"day\" \"day\" hour JSON string: \"hour\" \"hour\" Partition Field JSON object: {\n\"source-id\": <id int>,\n\"field-id\": <field id int>,\n\"name\": <name string>,\n\"transform\": <transform JSON>\n} {\n\"source-id\": 1,\n\"field-id\": 1000,\n\"name\": \"id_bucket\",\n\"transform\": \"bucket[16]\"\n} In some cases partition specs are stored using only the field list instead of the object format that includes the spec ID, like the deprecated partition-spec field in table metadata. The object format should be used unless otherwise noted in this spec.\nThe field-id property was added for each partition field in v2. In v1, the reference implementation assigned field ids sequentially in each spec starting at 1,000. See Partition Evolution for more details.\nSort Orders Sort orders are serialized as a list of JSON object, each of which contains the following fields:\nField JSON representation Example order-id JSON int 1 fields JSON list: [\n<sort field JSON>,\n...\n] [ {\n\"transform\": \"identity\",\n\"source-id\": 2,\n\"direction\": \"asc\",\n\"null-order\": \"nulls-first\"\n}, {\n\"transform\": \"bucket[4]\",\n\"source-id\": 3,\n\"direction\": \"desc\",\n\"null-order\": \"nulls-last\"\n} ] Each sort field in the fields list is stored as an object with the following properties:\nField JSON representation Example Sort Field JSON object: {\n\"transform\": <transform JSON>,\n\"source-id\": <source id int>,\n\"direction\": <direction string>,\n\"null-order\": <null-order string>\n} {\n\"transform\": \"bucket[4]\",\n\"source-id\": 3,\n\"direction\": \"desc\",\n\"null-order\": \"nulls-last\"\n} The following table describes the possible values for the some of the field within sort field:\nField JSON representation Possible values direction JSON string \"asc\", \"desc\" null-order JSON string \"nulls-first\", \"nulls-last\" Table Metadata and Snapshots Table metadata is serialized as a JSON object according to the following table. Snapshots are not serialized separately. Instead, they are stored in the table metadata JSON.\nMetadata field JSON representation Example format-version JSON int 1 table-uuid JSON string \"fb072c92-a02b-11e9-ae9c-1bb7bc9eca94\" location JSON string \"s3://b/wh/data.db/table\" last-updated-ms JSON long 1515100955770 last-column-id JSON int 22 schema JSON schema (object) See above, read schemas instead schemas JSON schemas (list of objects) See above current-schema-id JSON int 0 partition-spec JSON partition fields (list) See above, read partition-specs instead partition-specs JSON partition specs (list of objects) See above default-spec-id JSON int 0 last-partition-id JSON int 1000 properties JSON object: {\n\"<key>\": \"<val>\",\n...\n} {\n\"write.format.default\": \"avro\",\n\"commit.retry.num-retries\": \"4\"\n} current-snapshot-id JSON long 3051729675574597004 snapshots JSON list of objects: [ {\n\"snapshot-id\": <id>,\n\"timestamp-ms\": <timestamp-in-ms>,\n\"summary\": {\n\"operation\": <operation>,\n... },\n\"manifest-list\": \"<location>\",\n\"schema-id\": \"<id>\"\n},\n...\n] [ {\n\"snapshot-id\": 3051729675574597004,\n\"timestamp-ms\": 1515100955770,\n\"summary\": {\n\"operation\": \"append\"\n},\n\"manifest-list\": \"s3://b/wh/.../s1.avro\"\n\"schema-id\": 0\n} ] snapshot-log JSON list of objects: [\n{\n\"snapshot-id\": ,\n\"timestamp-ms\": },\n...\n] [ {\n\"snapshot-id\": 30517296...,\n\"timestamp-ms\": 1515100...\n} ] metadata-log JSON list of objects: [\n{\n\"metadata-file\": ,\n\"timestamp-ms\": },\n...\n] [ {\n\"metadata-file\": \"s3://bucket/.../v1.json\",\n\"timestamp-ms\": 1515100...\n} ] sort-orders JSON sort orders (list of sort field object) See above default-sort-order-id JSON int 0 refs JSON map with string key and object value:\n{\n\"<name>\": {\n\"snapshot-id\": <id>,\n\"type\": <type>,\n\"max-ref-age-ms\": <long>,\n...\n}\n...\n} {\n\"test\": {\n\"snapshot-id\": 123456789000,\n\"type\": \"tag\",\n\"max-ref-age-ms\": 10000000\n}\n} Name Mapping Serialization Name mapping is serialized as a list of field mapping JSON Objects which are serialized as follows\nField mapping field JSON representation Example names JSON list of strings [\"latitude\", \"lat\"] field_id JSON int 1 fields JSON field mappings (list of objects) [{ \"field-id\": 4,\n\"names\": [\"latitude\", \"lat\"]\n}, {\n\"field-id\": 5,\n\"names\": [\"longitude\", \"long\"]\n}] Example\n[ { \"field-id\": 1, \"names\": [\"id\", \"record_id\"] }, { \"field-id\": 2, \"names\": [\"data\"] }, { \"field-id\": 3, \"names\": [\"location\"], \"fields\": [ { \"field-id\": 4, \"names\": [\"latitude\", \"lat\"] }, { \"field-id\": 5, \"names\": [\"longitude\", \"long\"] } ] } ] Appendix D: Single-value serialization Binary single-value serialization This serialization scheme is for storing single values as individual binary values in the lower and upper bounds maps of manifest files.\nType Binary serialization boolean 0x00 for false, non-zero byte for true int Stored as 4-byte little-endian long Stored as 8-byte little-endian float Stored as 4-byte little-endian double Stored as 8-byte little-endian date Stores days from the 1970-01-01 in an 4-byte little-endian int time Stores microseconds from midnight in an 8-byte little-endian long timestamp without zone Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long timestamp with zone Stores microseconds from 1970-01-01 00:00:00.000000 UTC in an 8-byte little-endian long string UTF-8 bytes (without length) uuid 16-byte big-endian value, see example in Appendix B fixed(L) Binary value binary Binary value (without length) decimal(P, S) Stores unscaled value as two\u2019s-complement big-endian binary, using the minimum number of bytes for the value struct Not supported list Not supported map Not supported JSON single-value serialization Single values are serialized as JSON by type according to the following table:\nType JSON representation Example Description boolean JSON boolean true int JSON int 34 long JSON long 34 float JSON number 1.0 double JSON number 1.0 decimal(P,S) JSON number 14.20 Stores the decimal as a number with S places after the decimal date JSON string \"2017-11-16\" Stores ISO-8601 standard date time JSON string \"22:31:08.123456\" Stores ISO-8601 standard time with microsecond precision timestamp JSON string \"2017-11-16T22:31:08.123456\" Stores ISO-8601 standard timestamp with microsecond precision; must not include a zone offset timestamptz JSON string \"2017-11-16T22:31:08.123456-07:00\" Stores ISO-8601 standard timestamp with microsecond precision; must include a zone offset string JSON string \"iceberg\" uuid JSON string \"f79c3e09-677c-4bbd-a479-3f349cb785e7\" Stores the lowercase uuid string fixed(L) JSON string \"0x00010203\" Stored as a hexadecimal string, prefixed by 0x binary JSON string \"0x00010203\" Stored as a hexadecimal string, prefixed by 0x struct JSON object by field ID {\"1\": 1, \"2\": \"bar\"} Stores struct fields using the field ID as the JSON field name; field values are stored using this JSON single-value format list JSON array of values [1, 2, 3] Stores a JSON array of values that are serialized using this JSON single-value format map JSON object of key and value arrays { \"keys\": [\"a\", \"b\"], \"values\": [1, 2] } Stores arrays of keys and values; individual keys and values are serialized using this JSON single-value format Appendix E: Format version changes Version 3 Default values are added to struct fields in v3.\nThe write-default is a forward-compatible change because it is only used at write time. Old writers will fail because the field is missing. Tables with initial-default will be read correctly by older readers if initial-default is always null for optional fields. Otherwise, old readers will default optional columns with null. Old readers will fail to read required fields which are populated by initial-default because that default is not supported. Version 2 Writing v1 metadata:\nTable metadata field last-sequence-number should not be written Snapshot field sequence-number should not be written Manifest list field sequence-number should not be written Manifest list field min-sequence-number should not be written Manifest list field content must be 0 (data) or omitted Manifest entry field sequence_number should not be written Data file field content must be 0 (data) or omitted Reading v1 metadata for v2:\nTable metadata field last-sequence-number must default to 0 Snapshot field sequence-number must default to 0 Manifest list field sequence-number must default to 0 Manifest list field min-sequence-number must default to 0 Manifest list field content must default to 0 (data) Manifest entry field sequence_number must default to 0 Data file field content must default to 0 (data) Writing v2 metadata:\nTable metadata JSON: last-sequence-number was added and is required; default to 0 when reading v1 metadata table-uuid is now required current-schema-id is now required schemas is now required partition-specs is now required default-spec-id is now required last-partition-id is now required sort-orders is now required default-sort-order-id is now required schema is no longer required and should be omitted; use schemas and current-schema-id instead partition-spec is no longer required and should be omitted; use partition-specs and default-spec-id instead Snapshot JSON: sequence-number was added and is required; default to 0 when reading v1 metadata manifest-list is now required manifests is no longer required and should be omitted; always use manifest-list instead Manifest list manifest_file: content was added and is required; 0=data, 1=deletes; default to 0 when reading v1 manifest lists sequence_number was added and is required min_sequence_number was added and is required added_files_count is now required existing_files_count is now required deleted_files_count is now required added_rows_count is now required existing_rows_count is now required deleted_rows_count is now required Manifest key-value metadata: schema-id is now required partition-spec-id is now required format-version is now required content was added and is required (must be \u201cdata\u201d or \u201cdeletes\u201d) Manifest manifest_entry: snapshot_id is now optional to support inheritance sequence_number was added and is optional, to support inheritance Manifest data_file: content was added and is required; 0=data, 1=position deletes, 2=equality deletes; default to 0 when reading v1 manifests equality_ids was added, to be used for equality deletes only block_size_in_bytes was removed (breaks v1 reader compatibility) file_ordinal was removed sort_columns was removed Note that these requirements apply when writing data to a v2 table. Tables that are upgraded from v1 may contain metadata that does not follow these requirements. Implementations should remain backward-compatible with v1 metadata requirements.\n", "description": "", "title": "Spec", "uri": "/spec/"}, {"categories": null, "content": " Iceberg Talks Here is a list of talks and other videos related to Iceberg.\nData architecture in 2022 Date: May 5, 2022, Authors: Ryan Blue\nWhy You Shouldn\u2019t Care About Iceberg | Tabular Date: March 24, 2022, Authors: Ryan Blue\nExpert Roundtable: The Future of Metadata After Hive Metastore Date: November 15, 2021, Authors: Lior Ebel, Seshu Adunuthula, Ryan Blue & Oz Katz\nPresto and Apache Iceberg: Building out Modern Open Data Lakes Date: November 10, 2021, Authors: Daniel Weeks, Chunxu Tang\nIceberg Case Studies Date: September 29, 2021, Authors: Ryan Blue\nSpark and Iceberg at Apple\u2019s Scale - Leveraging differential files for efficient upserts and deletes Date: October 21, 2020, Author: Anton\nApache Iceberg - A Table Format for Huge Analytic Datasets Date: October 21, 2020, Author: Ryan Blue\n", "description": "", "title": "Talks", "uri": "/talks/"}, {"categories": null, "content": " Terms Snapshot A snapshot is the state of a table at some time.\nEach snapshot lists all of the data files that make up the table\u2019s contents at the time of the snapshot. Data files are stored across multiple manifest files, and the manifests for a snapshot are listed in a single manifest list file.\nManifest list A manifest list is a metadata file that lists the manifests that make up a table snapshot.\nEach manifest file in the manifest list is stored with information about its contents, like partition value ranges, used to speed up metadata operations.\nManifest file A manifest file is a metadata file that lists a subset of data files that make up a snapshot.\nEach data file in a manifest is stored with a partition tuple, column-level stats, and summary information used to prune splits during scan planning.\nPartition spec A partition spec is a description of how to partition data in a table.\nA spec consists of a list of source columns and transforms. A transform produces a partition value from a source value. For example, date(ts) produces the date associated with a timestamp column named ts.\nPartition tuple A partition tuple is a tuple or struct of partition data stored with each data file.\nAll values in a partition tuple are the same for all rows stored in a data file. Partition tuples are produced by transforming values from row data using a partition spec.\nIceberg stores partition values unmodified, unlike Hive tables that convert values to and from strings in file system paths and keys.\nSnapshot log (history table) The snapshot log is a metadata log of how the table\u2019s current snapshot has changed over time.\nThe log is a list of timestamp and ID pairs: when the current snapshot changed and the snapshot ID the current snapshot was changed to.\nThe snapshot log is stored in table metadata as snapshot-log.\n", "description": "", "title": "Terms", "uri": "/terms/"}, {"categories": null, "content": " Trademarks Apache Iceberg, Iceberg, Apache, the Apache feather logo, and the Apache Iceberg project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.\n", "description": "", "title": "Trademarks", "uri": "/trademarks/"}, {"categories": null, "content": " Iceberg View Spec Background and Motivation Most compute engines (e.g. Trino and Apache Spark) support views. A view is a logical table that can be referenced by future queries. Views do not contain any data. Instead, the query stored by the view is executed every time the view is referenced by another query.\nEach compute engine stores the metadata of the view in its proprietary format in the metastore of choice. Thus, views created from one engine can not be read or altered easily from another engine even when engines share the metastore as well as the storage system. This document standardizes the view metadata for ease of sharing the views across engines.\nGoals A common metadata format for view metadata, similar to how Iceberg supports a common table format for tables. The view metadata format specification Includes storage format as well as APIs to write/read the metadata. Supports versioning of views to track how a view evolved over time. Overview View metadata storage mirrors how Iceberg table metadata is stored and retrieved. View metadata is maintained in metadata files. All changes to view state create a new view metadata file and completely replace the old metadata using an atomic swap. Like Iceberg tables, this atomic swap is delegated to the metastore that tracks tables and/or views by name. The view metadata file tracks the view schema, custom properties, current and past versions, as well as other metadata. Each metadata file is self-sufficient. It contains the history of the last few operations performed on the view and can be used to roll back the view to a previous version.\nMetadata Location An atomic swap of one view metadata file for another provides the basis for making atomic changes. Readers use the version of the view that was current when they loaded the view metadata and are not affected by changes until they refresh and pick up a new metadata location.\nWriters create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer\u2019s commit. Once a writer has created an update, it commits by swapping the view\u2019s metadata file pointer from the base location to the new location.\nSpecification Terms Schema \u2013 Names and types of fields in a view. Version \u2013 The state of a view at some point in time. View Metadata The view version metadata file has the following fields:\nRequired/Optional Field Name Description Required format-version An integer version number for the view format. Currently, this must be 1. Implementations must throw an exception if the view\u2019s version is higher than the supported version. Required location The view\u2019s base location. This is used to determine where to store manifest files and view metadata files. Required current-version-id Current version of the view. Set to \u20181\u2019 when the view is first created. Optional properties A string to string map of view properties. This is used for metadata such as \u201ccomment\u201d and for settings that affect view maintenance. This is not intended to be used for arbitrary metadata. Required versions An array of structs describing the last known versions of the view. Controlled by the table property: \u201cversion.history.num-entries\u201d. See section Versions. Required version-log A list of timestamp and version ID pairs that encodes changes to the current version for the view. Each time the current-version-id is changed, a new entry should be added with the last-updated-ms and the new current-version-id. Optional schemas A list of schemas, the same as the \u2018schemas\u2019 field from Iceberg table spec. Optional current-schema-id ID of the current schema of the view Versions Field \u201cversions\u201d is an array of structs with the following fields:\nRequired/Optional Field Name Description Required version-id Monotonically increasing id indicating the version of the view. Starts with 1. Required timestamp-ms Timestamp expressed in ms since epoch at which the version of the view was created. Required summary A string map summarizes the version changes, including operation, described in Summary. Required representations A list of \u201crepresentations\u201d as described in Representations. Version Log Field \u201cversion-log\u201d is an array of structs that describe when each version was considered \u201ccurrent\u201d. Creation time is different and is stored in each version\u2019s metadata. This allows you to reconstruct what someone would have seen at some point in time. If the view has been updated and rolled back, this will show it. The struct has the following fields:\nRequired/Optional Field Name Description Required timestamp-ms The timestamp when the referenced version was made the current version Required version-id Version id of the view Summary Field \u201csummary\u201d is a string map with the following keys. Only operation is required. Engines may store additional key-value pairs in this map.\nRequired/Optional Key Value Required operation A string value indicating the view operation that caused this metadata to be created. Allowed values are \u201ccreate\u201d and \u201creplace\u201d. Optional engine-version A string value indicating the version of the engine that performed the operation Representations Each representation is stored as an object with only one common field \u201ctype\u201d. The rest of the fields are interpreted based on the type. There is only one type of representation defined in the spec.\nOriginal View Definition in SQL This type of representation stores the original view definition in SQL and its SQL dialect.\nRequired/Optional Field Name Description Required type A string indicating the type of representation. It is set to \u201csql\u201d for this type. Required sql A string representing the original view definition in SQL Required dialect A string specifying the dialect of the \u2018sql\u2019 field. It can be used by the engines to detect the SQL dialect. Optional schema-id ID of the view\u2019s schema when the version was created Optional default-catalog A string specifying the catalog to use when the table or view references in the view definition do not contain an explicit catalog. Optional default-namespace The namespace to use when the table or view references in the view definition do not contain an explicit namespace. Since the namespace may contain multiple parts, it is serialized as a list of strings. Optional field-aliases A list of strings of field aliases optionally specified in the create view statement. The list should have the same length as the schema\u2019s top level fields. See the example below. Optional field-docs A list of strings of field comments optionally specified in the create view statement. The list should have the same length as the schema\u2019s top level fields. See the example below. For CREATE VIEW v (alias_name COMMENT 'docs', alias_name2, ...) AS SELECT col1, col2, ..., the field aliases are \u2018alias_name\u2019, \u2018alias_name2\u2019, and etc., and the field docs are \u2018docs\u2019, null, and etc.\nAppendix A: An Example The JSON metadata file format is described using an example below.\nImagine the following sequence of operations:\nCREATE TABLE base_tab(c1 int, c2 varchar); INSERT INTO base_tab VALUES (1,\u2019one\u2019), (2,\u2019two\u2019); CREATE VIEW common_view AS SELECT * FROM base_tab; CREATE OR REPLACE VIEW common_view AS SELECT count(*) AS my_cnt FROM base_tab; The metadata JSON file created at the end of step 3 looks as follows. The file path looks like: s3://my_company/my/warehouse/anorwood.db/common_view\nThe path is intentionally similar to the path for iceberg tables and contains a \u2018metadata\u2019 directory. (METASTORE_WAREHOUSE_DIR/<dbname>.db/<viewname>/metadata)\nThe metadata directory contains View Version Metadata files. The text after \u2018=>\u2019 symbols describes the fields.\n{ \"format-version\" : 1, => JSON format. Will change as format evolves. \"location\" : \"s3n://my_company/my/warehouse/anorwood.db/common_view\", \"current-version-id\" : 1, => current / latest version of the view. \u20181\u2019 here since this metadata was created when the view was created. \"properties\" : { => shows properties of the view \"comment\" : \"View captures all the data from the table\" => View comment }, \"versions\" : [ { => Last few versions of the view. \"version-id\" : 1, \"parent-version-id\" : -1, \"timestamp-ms\" : 1573518431292, \"summary\" : { \"operation\" : \"create\", => View operation that caused this metadata to be created \"engineVersion\" : \"presto-350\", => Version of the engine that performed the operation (create / replace) }, \"representations\" : [ { => SQL metadata of the view \"type\" : \"sql\", \"sql\" : \"SELECT *\\nFROM\\n base_tab\\n\", => original view SQL \"dialect\" : \"presto\", \"schema-id\" : 1, \"default-catalog\" : \"iceberg\", \"default-namespace\" : [ \"anorwood\" ] } ], } ], \"version-log\" : [ { => Log of the created versions \"timestamp-ms\" : 1573518431292, \"version-id\" : 1 } ], \"schemas\": [ { => Schema of the view expressed in Iceberg types \"schema-id\": 1, \"type\" : \"struct\", \"fields\" : [ { \"id\" : 0, \"name\" : \"c1\", \"required\" : false, \"type\" : \"int\", \"doc\" : \"\" => Column comment }, { \"id\" : 1, \"name\" : \"c2\", \"required\" : false, \"type\" : \"string\", \"doc\" : \"\" } ] } ], \"current-schema-id\": 1 } The Iceberg / view library creates a new metadata JSON file every time the view undergoes a DDL change. This way the history of how the view evolved can be maintained. Following metadata JSON file was created at the end of Step 4.\n{ \"format-version\" : 1, \"location\" : \"s3n://my_company/my/warehouse/anorwood.db/common_view\", \"current-version-id\" : 2, \"properties\" : { => shows properties of the view \"comment\" : \"View captures count of the data from the table\" }, \"versions\" : [ { \"version-id\" : 1, \"parent-version-id\" : -1, \"timestamp-ms\" : 1573518431292, \"summary\" : { \"operation\" : \"create\", \"engineVersion\" : \"presto-350\", }, \"representations\" : [ { \"type\" : \"sql\", \"sql\" : \"SELECT *\\nFROM\\n base_tab\\n\", \"dialect\" : \"presto\", \"schema-id\" : 1, \"default-catalog\" : \"iceberg\", \"default-namespace\" : [ \"anorwood\" ] } ], \"properties\" : { } }, { \"version-id\" : 2, \"parent-version-id\" : 1, => Version 2 was created on top of version 1, making parent-version-id 1 \"timestamp-ms\" : 1573518440265, \"summary\" : { \"operation\" : \"replace\", => The \u2018replace\u2019 operation caused this latest version creation \"engineVersion\" : \"spark-2.4.4\", }, \"representations\" : [ { \"type\" : \"sql\", \"sql\" : \"SELECT \\\"count\\\"(*) my_cnt\\nFROM\\n base_tab\\n\", => Note the updated text from the \u2018replace\u2019 view statement \"dialect\" : \"spark\", \"schema-id\" : 2, \"default-catalog\" : \"iceberg\", \"default-namespace\" : [ \"anorwood\" ] }, } ], \"version-log\" : [ { \"timestamp-ms\" : 1573518431292, \"version-id\" : 1 }, { \"timestamp-ms\" : 1573518440265, \"version-id\" : 2 } ], \"schemas\": [ { => Schema of the view expressed in Iceberg types \"schema-id\": 1, \"type\" : \"struct\", \"fields\" : [ { \"id\" : 0, \"name\" : \"c1\", \"required\" : false, \"type\" : \"int\", \"doc\" : \"\" => Column comment }, { \"id\" : 1, \"name\" : \"c2\", \"required\" : false, \"type\" : \"string\", \"doc\" : \"\" } ] }, { => Schema change is reflected here \"schema-id\": 2, \"type\" : \"struct\", \"fields\" : [ { \"id\" : 0, \"name\" : \"my_cnt\", \"required\" : false, \"type\" : \"long\", \"doc\" : \"\" } ] } ], \"current-schema-id\": 2 } ", "description": "", "title": "View Spec", "uri": "/view-spec/"}, {"categories": null, "content": " Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Learn More ", "description": "", "title": "What is Iceberg?", "uri": "/about/about/"}, {"categories": null, "content": " Getting Started The latest version of Iceberg is 0.14.0.\nSpark is currently the most feature-rich compute engine for Iceberg operations. We recommend you to get started with Spark to understand Iceberg concepts and features with examples. You can also view documentations of using Iceberg with other compute engine under the Engines tab.\nUsing Iceberg in Spark 3 To use Iceberg in a Spark shell, use the --packages option:\nspark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0 If you want to include Iceberg in your Spark installation, add the iceberg-spark-runtime-3.2_2.12 Jar to Spark\u2019s jars folder. Adding catalogs Iceberg comes with catalogs that enable SQL commands to manage tables and load them by name. Catalogs are configured using properties under spark.sql.catalog.(catalog_name).\nThis command creates a path-based catalog named local for tables under $PWD/warehouse and adds support for Iceberg tables to Spark\u2019s built-in catalog:\nspark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0\\ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \\ --conf spark.sql.catalog.spark_catalog.type=hive \\ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.local.type=hadoop \\ --conf spark.sql.catalog.local.warehouse=$PWD/warehouse Creating a table To create your first Iceberg table in Spark, use the spark-sql shell or spark.sql(...) to run a CREATE TABLE command:\n-- local is the path-based catalog defined above CREATE TABLE local.db.table (id bigint, data string) USING iceberg Iceberg catalogs support the full range of SQL DDL commands, including:\nCREATE TABLE ... PARTITIONED BY CREATE TABLE ... AS SELECT ALTER TABLE DROP TABLE Writing Once your table is created, insert data using INSERT INTO:\nINSERT INTO local.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c'); INSERT INTO local.db.table SELECT id, data FROM source WHERE length(data) = 1; Iceberg also adds row-level SQL updates to Spark, MERGE INTO and DELETE FROM:\nMERGE INTO local.db.target t USING (SELECT * FROM updates) u ON t.id = u.id WHEN MATCHED THEN UPDATE SET t.count = t.count + u.count WHEN NOT MATCHED THEN INSERT * Iceberg supports writing DataFrames using the new v2 DataFrame write API:\nspark.table(\"source\").select(\"id\", \"data\") .writeTo(\"local.db.table\").append() The old write API is supported, but not recommended.\nReading To read with SQL, use the an Iceberg table name in a SELECT query:\nSELECT count(1) as count, data FROM local.db.table GROUP BY data SQL is also the recommended way to inspect tables. To view all of the snapshots in a table, use the snapshots metadata table:\nSELECT * FROM local.db.table.snapshots +-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+ | committed_at | snapshot_id | parent_id | operation | manifest_list | ... | +-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+ | 2019-02-08 03:29:51.215 | 57897183625154 | null | append | s3://.../table/metadata/snap-57897183625154-1.avro | ... | | | | | | | ... | | | | | | | ... | | ... | ... | ... | ... | ... | ... | +-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+ DataFrame reads are supported and can now reference tables by name using spark.table:\nval df = spark.table(\"local.db.table\") df.count() Next steps Next, you can learn more about Iceberg tables in Spark:\nDDL commands: CREATE, ALTER, and DROP Querying data: SELECT queries and metadata tables Writing data: INSERT INTO and MERGE INTO Maintaining tables with stored procedures ", "description": "", "title": "Getting Started", "uri": "/docs/latest/getting-started/"}, {"categories": null, "content": " Hive Iceberg supports reading and writing Iceberg tables through Hive by using a StorageHandler.\nFeature support Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supports the following features:\nCreating a table Dropping a table Reading a table Inserting into a table (INSERT INTO) DML operations work only with MapReduce execution engine. With Hive version 4.0.0-alpha-1 and above, the Iceberg integration when using HiveCatalog supports the following additional features:\nCreating an Iceberg identity-partitioned table Creating an Iceberg table with any partition spec, including the various transforms supported by Iceberg Creating a table from an existing table (CTAS table) Altering a table while keeping Iceberg and Hive schemas in sync Altering the partition schema (updating columns) Altering the partition schema by specifying partition transforms Truncating a table Migrating tables in Avro, Parquet, or ORC (Non-ACID) format to Iceberg Reading the schema of a table Querying Iceberg metadata tables Time travel applications Inserting into a table (INSERT INTO) Inserting data overwriting existing data (INSERT OVERWRITE) DML operations work only with Tez execution engine. Enabling Iceberg support in Hive Hive 4.0.0-alpha-1 Hive 4.0.0-alpha-1 comes with the Iceberg 0.13.1 included. No additional downloads or jars are needed.\nHive 2.3.x, Hive 3.1.x In order to use Hive 2.3.x or Hive 3.1.x, you must load the Iceberg-Hive runtime jar and enable Iceberg support, either globally or for an individual table using a table property.\nLoading runtime jar To enable Iceberg support in Hive, the HiveIcebergStorageHandler and supporting classes need to be made available on Hive\u2019s classpath. These are provided by the iceberg-hive-runtime jar file. For example, if using the Hive shell, this can be achieved by issuing a statement like so:\nadd jar /path/to/iceberg-hive-runtime.jar; There are many others ways to achieve this including adding the jar file to Hive\u2019s auxiliary classpath so it is available by default. Please refer to Hive\u2019s documentation for more information.\nEnabling support If the Iceberg storage handler is not in Hive\u2019s classpath, then Hive cannot load or update the metadata for an Iceberg table when the storage handler is set. To avoid the appearance of broken tables in Hive, Iceberg will not add the storage handler to a table unless Hive support is enabled. The storage handler is kept in sync (added or removed) every time Hive engine support for the table is updated, i.e. turned on or off in the table properties. There are two ways to enable Hive support: globally in Hadoop Configuration and per-table using a table property.\nHadoop configuration To enable Hive support globally for an application, set iceberg.engine.hive.enabled=true in its Hadoop configuration. For example, setting this in the hive-site.xml loaded by Spark will enable the storage handler for all tables created by Spark.\nStarting with Apache Iceberg 0.11.0, when using Hive with Tez you also have to disable vectorization (hive.vectorized.execution.enabled=false). Table property configuration Alternatively, the property engine.hive.enabled can be set to true and added to the table properties when creating the Iceberg table. Here is an example of doing it programmatically:\nCatalog catalog=...; Map<String, String> tableProperties=Maps.newHashMap(); tableProperties.put(TableProperties.ENGINE_HIVE_ENABLED,\"true\"); // engine.hive.enabled=true catalog.createTable(tableId,schema,spec,tableProperties); The table level configuration overrides the global Hadoop configuration.\nHive on Tez configuration To use the Tez engine on Hive 3.1.2 or later, Tez needs to be upgraded to >= 0.10.1 which contains a necessary fix TEZ-4248.\nTo use the Tez engine on Hive 2.3.x, you will need to manually build Tez from the branch-0.9 branch due to a backwards incompatibility issue with Tez 0.10.1.\nYou will also need to set the following property in the Hive configuration: tez.mrreader.config.update.properties=hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids.\nCatalog Management Global Hive catalog From the Hive engine\u2019s perspective, there is only one global data catalog that is defined in the Hadoop configuration in the runtime environment. In contrast, Iceberg supports multiple different data catalog types such as Hive, Hadoop, AWS Glue, or custom catalog implementations. Iceberg also allows loading a table directly based on its path in the file system. Those tables do not belong to any catalog. Users might want to read these cross-catalog and path-based tables through the Hive engine for use cases like join.\nTo support this, a table in the Hive metastore can represent three different ways of loading an Iceberg table, depending on the table\u2019s iceberg.catalog property:\nThe table will be loaded using a HiveCatalog that corresponds to the metastore configured in the Hive environment if no iceberg.catalog is set The table will be loaded using a custom catalog if iceberg.catalog is set to a catalog name (see below) The table can be loaded directly using the table\u2019s root location if iceberg.catalog is set to location_based_table For cases 2 and 3 above, users can create an overlay of an Iceberg table in the Hive metastore, so that different table types can work together in the same Hive environment. See CREATE EXTERNAL TABLE and CREATE TABLE for more details.\nCustom Iceberg catalogs To globally register different catalogs, set the following Hadoop configurations:\nConfig Key Description iceberg.catalog.<catalog_name>.type type of catalog: hive, hadoop, or left unset if using a custom catalog iceberg.catalog.<catalog_name>.catalog-impl catalog implementation, must not be null if type is empty iceberg.catalog.<catalog_name>.<key> any config key and value pairs for the catalog Here are some examples using Hive CLI:\nRegister a HiveCatalog called another_hive:\nSET iceberg.catalog.another_hive.type=hive; SET iceberg.catalog.another_hive.uri=thrift://example.com:9083; SET iceberg.catalog.another_hive.clients=10; SET iceberg.catalog.another_hive.warehouse=hdfs://example.com:8020/warehouse; Register a HadoopCatalog called hadoop:\nSET iceberg.catalog.hadoop.type=hadoop; SET iceberg.catalog.hadoop.warehouse=hdfs://example.com:8020/warehouse; Register an AWS GlueCatalog called glue:\nSET iceberg.catalog.glue.catalog-impl=org.apache.iceberg.aws.GlueCatalog; SET iceberg.catalog.glue.warehouse=s3://my-bucket/my/key/prefix; SET iceberg.catalog.glue.lock-impl=org.apache.iceberg.aws.glue.DynamoLockManager; SET iceberg.catalog.glue.lock.table=myGlueLockTable; DDL Commands Not all the features below are supported with Hive 2.3.x and Hive 3.1.x. Please refer to the Feature support paragraph for further details.\nOne generally applicable difference is that Hive 4.0.0-alpha-1 provides the possibility to use STORED BY ICEBERG instead of the old STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'\nCREATE TABLE Non partitioned tables The Hive CREATE EXTERNAL TABLE command creates an Iceberg table when you specify the storage handler as follows:\nCREATE EXTERNAL TABLE x (i int) STORED BY ICEBERG; If you want to create external tables using CREATE TABLE, configure the MetaStoreMetadataTransformer on the cluster, and CREATE TABLE commands are transformed to create external tables. For example:\nCREATE TABLE x (i int) STORED BY ICEBERG; You can specify the default file format (Avro, Parquet, ORC) at the time of the table creation. The default is Parquet:\nCREATE TABLE x (i int) STORED BY ICEBERG STORED AS ORC; Partitioned tables You can create Iceberg partitioned tables using a command familiar to those who create non-Iceberg tables:\nCREATE TABLE x (i int) PARTITIONED BY (j int) STORED BY ICEBERG; The resulting table does not create partitions in HMS, but instead, converts partition data into Iceberg identity partitions. Use the DESCRIBE command to get information about the Iceberg identity partitions:\nDESCRIBE x; The result is:\ncol_name data_type comment i int j int NULL NULL # Partition Transform Information NULL NULL # col_name transform_type NULL j IDENTITY NULL You can create Iceberg partitions using the following Iceberg partition specification syntax (supported only in Hive 4.0.0-alpha-1):\nCREATE TABLE x (i int, ts timestamp) PARTITIONED BY SPEC (month(ts), bucket(2, i)) STORED AS ICEBERG; DESCRIBE x; The result is:\ncol_name data_type comment i int ts timestamp NULL NULL # Partition Transform Information NULL NULL # col_name transform_type NULL ts MONTH NULL i BUCKET[2] NULL The supported transformations for Hive are the same as for Spark:\nyears(ts): partition by year months(ts): partition by month days(ts) or date(ts): equivalent to dateint partitioning hours(ts) or date_hour(ts): equivalent to dateint and hour partitioning bucket(N, col): partition by hashed value mod N buckets truncate(L, col): partition by value truncated to L Strings are truncated to the given length Integers and longs truncate to bins: truncate(10, i) produces partitions 0, 10, 20, 30, The resulting table does not create partitions in HMS, but instead, converts partition data into Iceberg partitions. CREATE TABLE AS SELECT CREATE TABLE AS SELECT operation resembles the native Hive operation with a single important difference. The Iceberg table and the corresponding Hive table are created at the beginning of the query execution. The data is inserted / committed when the query finishes. So for a transient period the table already exists but contains no data.\nCREATE TABLE target PARTITIONED BY SPEC (year(year_field), identity_field) STORED BY ICEBERG AS SELECT * FROM source; CREATE EXTERNAL TABLE overlaying an existing Iceberg table The CREATE EXTERNAL TABLE command is used to overlay a Hive table \u201con top of\u201d an existing Iceberg table. Iceberg tables are created using either a Catalog, or an implementation of the Tables interface, and Hive needs to be configured accordingly to operate on these different types of table.\nHive catalog tables As described before, tables created by the HiveCatalog with Hive engine feature enabled are directly visible by the Hive engine, so there is no need to create an overlay.\nCustom catalog tables For a table in a registered catalog, specify the catalog name in the statement using table property iceberg.catalog. For example, the SQL below creates an overlay for a table in a hadoop type catalog named hadoop_cat:\nSET iceberg.catalog.hadoop_cat.type=hadoop; SET iceberg.catalog.hadoop_cat.warehouse=hdfs://example.com:8020/hadoop_cat; CREATE EXTERNAL TABLE database_a.table_a STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' TBLPROPERTIES ('iceberg.catalog'='hadoop_cat'); When iceberg.catalog is missing from both table properties and the global Hadoop configuration, HiveCatalog will be used as default.\nPath-based Hadoop tables Iceberg tables created using HadoopTables are stored entirely in a directory in a filesystem like HDFS. These tables are considered to have no catalog. To indicate that, set iceberg.catalog property to location_based_table. For example:\nCREATE EXTERNAL TABLE table_a STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://some_bucket/some_path/table_a' TBLPROPERTIES ('iceberg.catalog'='location_based_table'); CREATE TABLE overlaying an existing Iceberg table You can also create a new table that is managed by a custom catalog. For example, the following code creates a table in a custom Hadoop catalog:\nSET iceberg.catalog.hadoop_cat.type=hadoop; SET iceberg.catalog.hadoop_cat.warehouse=hdfs://example.com:8020/hadoop_cat; CREATE TABLE database_a.table_a ( id bigint, name string ) PARTITIONED BY ( dept string ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' TBLPROPERTIES ('iceberg.catalog'='hadoop_cat'); If the table to create already exists in the custom catalog, this will create a managed overlay table. This means technically you can omit the EXTERNAL keyword when creating an overlay table. However, this is not recommended because creating managed overlay tables could pose a risk to the shared data files in case of accidental drop table commands from the Hive side, which would unintentionally remove all the data in the table. ALTER TABLE Table properties For HiveCatalog tables the Iceberg table properties and the Hive table properties stored in HMS are kept in sync.\nIMPORTANT: This feature is not available for other Catalog implementations. ALTER TABLE t SET TBLPROPERTIES('...'='...'); Schema evolution The Hive table schema is kept in sync with the Iceberg table. If an outside source (Impala/Spark/Java API/etc) changes the schema, the Hive table immediately reflects the changes. You alter the table schema using Hive commands:\nAdd a column ALTER TABLE orders ADD COLUMNS (nickname string); Rename a column ALTER TABLE orders CHANGE COLUMN item fruit string; Reorder columns ALTER TABLE orders CHANGE COLUMN quantity quantity int AFTER price; Change a column type - only if the Iceberg defined the column type change as safe ALTER TABLE orders CHANGE COLUMN price price long; Drop column by using REPLACE COLUMN to remove the old column ALTER TABLE orders REPLACE COLUMNS (remaining string); Note, that dropping columns is only thing REPLACE COLUMNS can be used for i.e. if columns are specified out-of-order an error will be thrown signalling this limitation. Partition evolution You change the partitioning schema using the following commands:\nChange the partitioning schema to new identity partitions: ALTER TABLE default.customers SET PARTITION SPEC (last_name); Alternatively, provide a partition specification: ALTER TABLE order SET PARTITION SPEC (month(ts)); Table migration You can migrate Avro / Parquet / ORC external tables to Iceberg tables using the following command:\nALTER TABLE t SET TBLPROPERTIES ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'); During the migration the data files are not changed, only the appropriate Iceberg metadata files are created. After the migration, handle the table as a normal Iceberg table.\nTRUNCATE TABLE The following command truncates the Iceberg table:\nTRUNCATE TABLE t; Using a partition specification is not allowed.\nDROP TABLE Tables can be dropped using the DROP TABLE command:\nDROP TABLE [IF EXISTS] table_name [PURGE]; DML Commands SELECT Select statements work the same on Iceberg tables in Hive. You will see the Iceberg benefits over Hive in compilation and execution:\nNo file system listings - especially important on blob stores, like S3 No partition listing from the Metastore Advanced partition filtering - the partition keys are not needed in the queries when they could be calculated Could handle higher number of partitions than normal Hive tables Here are the features highlights for Iceberg Hive read support:\nPredicate pushdown: Pushdown of the Hive SQL WHERE clause has been implemented so that these filters are used at the Iceberg TableScan level as well as by the Parquet and ORC Readers. Column projection: Columns from the Hive SQL SELECT clause are projected down to the Iceberg readers to reduce the number of columns read. Hive query engines: With Hive 2.3.x, 3.1.x both the MapReduce and Tez query execution engines are supported. With Hive 4.0.0-alpha-1 Tez query execution engine is supported. Some of the advanced / little used optimizations are not yet implemented for Iceberg tables, so you should check your individual queries. Also currently the statistics stored in the MetaStore are used for query planning. This is something we are planning to improve in the future.\nINSERT INTO Hive supports the standard single-table INSERT INTO operation:\nINSERT INTO table_a VALUES ('a', 1); INSERT INTO table_a SELECT...; Multi-table insert is also supported, but it will not be atomic. Commits occur one table at a time. Partial changes will be visible during the commit process and failures can leave partial changes committed. Changes within a single table will remain atomic.\nHere is an example of inserting into multiple tables at once in Hive SQL:\nFROM customers INSERT INTO target1 SELECT customer_id, first_name INSERT INTO target2 SELECT last_name, customer_id; INSERT OVERWRITE INSERT OVERWRITE can replace data in the table with the result of a query. Overwrites are atomic operations for Iceberg tables. For nonpartitioned tables the content of the table is always removed. For partitioned tables the partitions that have rows produced by the SELECT query will be replaced.\nINSERT OVERWRITE TABLE target SELECT * FROM source; QUERYING METADATA TABLES Hive supports querying of the Iceberg Metadata tables. The tables could be used as normal Hive tables, so it is possible to use projections / joins / filters / etc. To reference a metadata table the full name of the table should be used, like: <DB_NAME>.<TABLE_NAME>.<METADATA_TABLE_NAME>.\nCurrently the following metadata tables are available in Hive:\nfiles entries snapshots manifests partitions SELECT * FROM default.table_a.files; TIMETRAVEL Hive supports snapshot id based and time base timetravel queries. For these views it is possible to use projections / joins / filters / etc. The function is available with the following syntax:\nSELECT * FROM table_a FOR SYSTEM_TIME AS OF '2021-08-09 10:35:57'; SELECT * FROM table_a FOR SYSTEM_VERSION AS OF 1234567; Type compatibility Hive and Iceberg support different set of types. Iceberg can perform type conversion automatically, but not for all combinations, so you may want to understand the type conversion in Iceberg in prior to design the types of columns in your tables. You can enable auto-conversion through Hadoop configuration (not enabled by default):\nConfig key Default Description iceberg.mr.schema.auto.conversion false if Hive should perform type auto-conversion Hive type to Iceberg type This type conversion table describes how Hive types are converted to the Iceberg types. The conversion applies on both creating Iceberg table and writing to Iceberg table via Hive.\nHive Iceberg Notes boolean boolean short integer auto-conversion byte integer auto-conversion integer integer long long float float double double date date timestamp timestamp without timezone timestamplocaltz timestamp with timezone Hive 3 only interval_year_month not supported interval_day_time not supported char string auto-conversion varchar string auto-conversion string string binary binary decimal decimal struct struct list list map map union not supported ", "description": "", "title": "Hive", "uri": "/docs/latest/hive/"}, {"categories": null, "content": " Iceberg AWS Integrations Iceberg provides integration with different AWS services through the iceberg-aws module. This section describes how to use Iceberg with AWS.\nEnabling AWS Integration The iceberg-aws module is bundled with Spark and Flink engine runtimes for all versions from 0.11.0 onwards. However, the AWS clients are not bundled so that you can use the same client version as your application. You will need to provide the AWS v2 SDK because that is what Iceberg depends on. You can choose to use the AWS SDK bundle, or individual AWS client packages (Glue, S3, DynamoDB, KMS, STS) if you would like to have a minimal dependency footprint.\nAll the default AWS clients use the URL Connection HTTP Client for HTTP connection management. This dependency is not part of the AWS SDK bundle and needs to be added separately. To choose a different HTTP client library such as Apache HTTP Client, see the section client customization for more details.\nAll the AWS module features can be loaded through custom catalog properties, you can go to the documentations of each engine to see how to load a custom catalog. Here are some examples.\nSpark For example, to use AWS features with Spark 3.0 and AWS clients version 2.17.131, you can start the Spark SQL shell with:\n# add Iceberg dependency ICEBERG_VERSION=0.14.0 DEPENDENCIES=\"org.apache.iceberg:iceberg-spark3-runtime:$ICEBERG_VERSION\" # add AWS dependnecy AWS_SDK_VERSION=2.17.131 AWS_MAVEN_GROUP=software.amazon.awssdk AWS_PACKAGES=( \"bundle\" \"url-connection-client\" ) for pkg in \"${AWS_PACKAGES[@]}\"; do DEPENDENCIES+=\",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION\" done # start Spark SQL client shell spark-sql --packages $DEPENDENCIES \\ --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \\ --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \\ --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO As you can see, In the shell command, we use --packages to specify the additional AWS bundle and HTTP client dependencies with their version as 2.17.131.\nFlink To use AWS module with Flink, you can download the necessary dependencies and specify them when starting the Flink SQL client:\n# download Iceberg dependency ICEBERG_VERSION=0.14.0 MAVEN_URL=https://repo1.maven.org/maven2 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg wget $ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar # download AWS dependnecy AWS_SDK_VERSION=2.17.131 AWS_MAVEN_URL=$MAVEN_URL/software/amazon/awssdk AWS_PACKAGES=( \"bundle\" \"url-connection-client\" ) for pkg in \"${AWS_PACKAGES[@]}\"; do wget $AWS_MAVEN_URL/$pkg/$AWS_SDK_VERSION/$pkg-$AWS_SDK_VERSION.jar done # start Flink SQL client shell /path/to/bin/sql-client.sh embedded \\ -j iceberg-flink-runtime-$ICEBERG_VERSION.jar \\ -j bundle-$AWS_SDK_VERSION.jar \\ -j url-connection-client-$AWS_SDK_VERSION.jar \\ shell With those dependencies, you can create a Flink catalog like the following:\nCREATE CATALOG my_catalog WITH ( 'type'='iceberg', 'warehouse'='s3://my-bucket/my/key/prefix', 'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog', 'io-impl'='org.apache.iceberg.aws.s3.S3FileIO' ); You can also specify the catalog configurations in sql-client-defaults.yaml to preload it:\ncatalogs: - name: my_catalog type: iceberg warehouse: s3://my-bucket/my/key/prefix catalog-impl: org.apache.iceberg.aws.glue.GlueCatalog io-impl: org.apache.iceberg.aws.s3.S3FileIO Hive To use AWS module with Hive, you can download the necessary dependencies similar to the Flink example, and then add them to the Hive classpath or add the jars at runtime in CLI:\nadd jar /my/path/to/iceberg-hive-runtime.jar; add jar /my/path/to/aws/bundle.jar; add jar /my/path/to/aws/url-connection-client.jar; With those dependencies, you can register a Glue catalog and create external tables in Hive at runtime in CLI by:\nSET iceberg.engine.hive.enabled=true; SET hive.vectorized.execution.enabled=false; SET iceberg.catalog.glue.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog; SET iceberg.catalog.glue.warehouse=s3://my-bucket/my/key/prefix; -- suppose you have an Iceberg table database_a.table_a created by GlueCatalog CREATE EXTERNAL TABLE database_a.table_a STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' TBLPROPERTIES ('iceberg.catalog'='glue'); You can also preload the catalog by setting the configurations above in hive-site.xml.\nCatalogs There are multiple different options that users can choose to build an Iceberg catalog with AWS.\nGlue Catalog Iceberg enables the use of AWS Glue as the Catalog implementation. When used, an Iceberg namespace is stored as a Glue Database, an Iceberg table is stored as a Glue Table, and every Iceberg table version is stored as a Glue TableVersion. You can start using Glue catalog by specifying the catalog-impl as org.apache.iceberg.aws.glue.GlueCatalog, just like what is shown in the enabling AWS integration section above. More details about loading the catalog can be found in individual engine pages, such as Spark and Flink.\nGlue Catalog ID There is a unique Glue metastore in each AWS account and each AWS region. By default, GlueCatalog chooses the Glue metastore to use based on the user\u2019s default AWS client credential and region setup. You can specify the Glue catalog ID through glue.id catalog property to point to a Glue catalog in a different AWS account. The Glue catalog ID is your numeric AWS account ID. If the Glue catalog is in a different region, you should configure you AWS client to point to the correct region, see more details in AWS client customization.\nSkip Archive By default, Glue stores all the table versions created and user can rollback a table to any historical version if needed. However, if you are streaming data to Iceberg, this will easily create a lot of Glue table versions. Therefore, it is recommended to turn off the archive feature in Glue by setting glue.skip-archive to true. For more details, please read Glue Quotas and the UpdateTable API.\nSkip Name Validation Allow user to skip name validation for table name and namespaces. It is recommended to stick to Glue best practice in https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html to make sure operations are Hive compatible. This is only added for users that have existing conventions using non-standard characters. When database name and table name validation are skipped, there is no guarantee that downstream systems would all support the names.\nOptimistic Locking By default, Iceberg uses Glue\u2019s optimistic locking for concurrent updates to a table. With optimistic locking, each table has a version id. If users retrieve the table metadata, Iceberg records the version id of that table. Users can update the table, but only if the version id on the server side has not changed. If there is a version mismatch, it means that someone else has modified the table before you did. The update attempt fails, because you have a stale version of the table. If this happens, Iceberg refreshes the metadata and checks if there might be potential conflict. If there is no commit conflict, the operation will be retried. Optimistic locking guarantees atomic transaction of Iceberg tables in Glue. It also prevents others from accidentally overwriting your changes.\nPlease use AWS SDK version >= 2.17.131 to leverage Glue\u2019s Optimistic Locking. If the AWS SDK version is below 2.17.131, only in-memory lock is used. To ensure atomic transaction, you need to set up a DynamoDb Lock Manager. Warehouse Location Similar to all other catalog implementations, warehouse is a required catalog property to determine the root path of the data warehouse in storage. By default, Glue only allows a warehouse location in S3 because of the use of S3FileIO. To store data in a different local or cloud store, Glue catalog can switch to use HadoopFileIO or any custom FileIO by setting the io-impl catalog property. Details about this feature can be found in the custom FileIO section.\nTable Location By default, the root location for a table my_table of namespace my_ns is at my-warehouse-location/my-ns.db/my-table. This default root location can be changed at both namespace and table level.\nTo use a different path prefix for all tables under a namespace, use AWS console or any AWS Glue client SDK you like to update the locationUri attribute of the corresponding Glue database. For example, you can update the locationUri of my_ns to s3://my-ns-bucket, then any newly created table will have a default root location under the new prefix. For instance, a new table my_table_2 will have its root location at s3://my-ns-bucket/my_table_2.\nTo use a completely different root path for a specific table, set the location table property to the desired root path value you want. For example, in Spark SQL you can do:\nCREATE TABLE my_catalog.my_ns.my_table ( id bigint, data string, category string) USING iceberg OPTIONS ('location'='s3://my-special-table-bucket') PARTITIONED BY (category); For engines like Spark that supports the LOCATION keyword, the above SQL statement is equivalent to:\nCREATE TABLE my_catalog.my_ns.my_table ( id bigint, data string, category string) USING iceberg LOCATION 's3://my-special-table-bucket' PARTITIONED BY (category); DynamoDB Catalog Iceberg supports using a DynamoDB table to record and manage database and table information.\nConfigurations The DynamoDB catalog supports the following configurations:\nProperty Default Description dynamodb.table-name iceberg name of the DynamoDB table used by DynamoDbCatalog Internal Table Design The DynamoDB table is designed with the following columns:\nColumn Key Type Description identifier partition key string table identifier such as db1.table1, or string NAMESPACE for namespaces namespace sort key string namespace name. A global secondary index (GSI) is created with namespace as partition key, identifier as sort key, no other projected columns v string row version, used for optimistic locking updated_at number timestamp (millis) of the last update created_at number timestamp (millis) of the table creation p.<property_key> string Iceberg-defined table properties including table_type, metadata_location and previous_metadata_location or namespace properties This design has the following benefits:\nit avoids potential hot partition issue if there are heavy write traffic to the tables within the same namespace, because the partition key is at the table level namespace operations are clustered in a single partition to avoid affecting table commit operations a sort key to partition key reverse GSI is used for list table operation, and all other operations are single row ops or single partition query. No full table scan is needed for any operation in the catalog. a string UUID version field v is used instead of updated_at to avoid 2 processes committing at the same millisecond multi-row transaction is used for catalog.renameTable to ensure idempotency properties are flattened as top level columns so that user can add custom GSI on any property field to customize the catalog. For example, users can store owner information as table property owner, and search tables by owner by adding a GSI on the p.owner column. RDS JDBC Catalog Iceberg also supports JDBC catalog which uses a table in a relational database to manage Iceberg tables. You can configure to use JDBC catalog with relational database services like AWS RDS. Read the JDBC integration page for guides and examples about using the JDBC catalog. Read this AWS documentation for more details about configuring JDBC catalog with IAM authentication.\nWhich catalog to choose? With all the available options, we offer the following guidance when choosing the right catalog to use for your application:\nif your organization has an existing Glue metastore or plans to use the AWS analytics ecosystem including Glue, Athena, EMR, Redshift and LakeFormation, Glue catalog provides the easiest integration. if your application requires frequent updates to table or high read and write throughput (e.g. streaming write), Glue and DynamoDB catalog provides the best performance through optimistic locking. if you would like to enforce access control for tables in a catalog, Glue tables can be managed as an IAM resource, whereas DynamoDB catalog tables can only be managed through item-level permission which is much more complicated. if you would like to query tables based on table property information without the need to scan the entire catalog, DynamoDB catalog allows you to build secondary indexes for any arbitrary property field and provide efficient query performance. if you would like to have the benefit of DynamoDB catalog while also connect to Glue, you can enable DynamoDB stream with Lambda trigger to asynchronously update your Glue metastore with table information in the DynamoDB catalog. if your organization already maintains an existing relational database in RDS or uses serverless Aurora to manage tables, JDBC catalog provides the easiest integration. DynamoDb Lock Manager Amazon DynamoDB can be used by HadoopCatalog or HadoopTables, so that for every commit, the catalog first obtains a lock using a helper DynamoDB table and then try to safely modify the Iceberg table. This is necessary for a file system-based catalog to ensure atomic transaction in storages like S3 that do not provide file write mutual exclusion.\nThis feature requires the following lock related catalog properties:\nSet lock-impl as org.apache.iceberg.aws.dynamodb.DynamoDbLockManager. Set lock.table as the DynamoDB table name you would like to use. If the lock table with the given name does not exist in DynamoDB, a new table is created with billing mode set as pay-per-request. Other lock related catalog properties can also be used to adjust locking behaviors such as heartbeat interval. For more details, please refer to Lock catalog properties.\nS3 FileIO Iceberg allows users to write data to S3 through S3FileIO. GlueCatalog by default uses this FileIO, and other catalogs can load this FileIO using the io-impl catalog property.\nProgressive Multipart Upload S3FileIO implements a customized progressive multipart upload algorithm to upload data. Data files are uploaded by parts in parallel as soon as each part is ready, and each file part is deleted as soon as its upload process completes. This provides maximized upload speed and minimized local disk usage during uploads. Here are the configurations that users can tune related to this feature:\nProperty Default Description s3.multipart.num-threads the available number of processors in the system number of threads to use for uploading parts to S3 (shared across all output streams) s3.multipart.part-size-bytes 32MB the size of a single part for multipart upload requests s3.multipart.threshold 1.5 the threshold expressed as a factor times the multipart size at which to switch from uploading using a single put object request to uploading using multipart upload s3.staging-dir java.io.tmpdir property value the directory to hold temporary files S3 Server Side Encryption S3FileIO supports all 3 S3 server side encryption modes:\nSSE-S3: When you use Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3), each object is encrypted with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data. SSE-KMS: Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key Management Service (SSE-KMS) is similar to SSE-S3, but with some additional benefits and charges for using this service. There are separate permissions for the use of a CMK that provides added protection against unauthorized access of your objects in Amazon S3. SSE-KMS also provides you with an audit trail that shows when your CMK was used and by whom. Additionally, you can create and manage customer managed CMKs or use AWS managed CMKs that are unique to you, your service, and your Region. SSE-C: With Server-Side Encryption with Customer-Provided Keys (SSE-C), you manage the encryption keys and Amazon S3 manages the encryption, as it writes to disks, and decryption, when you access your objects. To enable server side encryption, use the following configuration properties:\nProperty Default Description s3.sse.type none none, s3, kms or custom s3.sse.key aws/s3 for kms type, null otherwise A KMS Key ID or ARN for kms type, or a custom base-64 AES256 symmetric key for custom type. s3.sse.md5 null If SSE type is custom, this value must be set as the base-64 MD5 digest of the symmetric key to ensure integrity. S3 Access Control List S3FileIO supports S3 access control list (ACL) for detailed access control. User can choose the ACL level by setting the s3.acl property. For more details, please read S3 ACL Documentation.\nObject Store File Layout S3 and many other cloud storage services throttle requests based on object prefix. Data stored in S3 with a traditional Hive storage layout can face S3 request throttling as objects are stored under the same filepath prefix.\nIceberg by default uses the Hive storage layout, but can be switched to use the ObjectStoreLocationProvider. With ObjectStoreLocationProvider, a determenistic hash is generated for each stored file, with the hash appended directly after the write.data.path. This ensures files written to s3 are equally distributed across multiple prefixes in the S3 bucket. Resulting in minimized throttling and maximized throughput for S3-related IO operations. When using ObjectStoreLocationProvider having a shared and short write.data.path across your Iceberg tables will improve performance.\nFor more information on how S3 scales API QPS, checkout the 2018 re:Invent session on Best Practices for Amazon S3 and Amazon S3 Glacier. At 53:39 it covers how S3 scales/partitions & at 54:50 it discusses the 30-60 minute wait time before new partitions are created.\nTo use the ObjectStorageLocationProvider add 'write.object-storage.enabled'=true in the table\u2019s properties. Below is an example Spark SQL command to create a table using the ObjectStorageLocationProvider:\nCREATE TABLE my_catalog.my_ns.my_table ( id bigint, data string, category string) USING iceberg OPTIONS ( 'write.object-storage.enabled'=true, 'write.data.path'='s3://my-table-data-bucket') PARTITIONED BY (category); We can then insert a single row into this new table\nINSERT INTO my_catalog.my_ns.my_table VALUES (1, \"Pizza\", \"orders\"); Which will write the data to S3 with a hash (2d3905f8) appended directly after the write.object-storage.path, ensuring reads to the table are spread evenly across S3 bucket prefixes, and improving performance.\ns3://my-table-data-bucket/2d3905f8/my_ns.db/my_table/category=orders/00000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet Note, the path resolution logic for ObjectStoreLocationProvider is write.data.path then <tableLocation>/data. However, for the older versions up to 0.12.0, the logic is as follows:\nbefore 0.12.0, write.object-storage.path must be set. at 0.12.0, write.object-storage.path then write.folder-storage.path then <tableLocation>/data. For more details, please refer to the LocationProvider Configuration section.\nS3 Strong Consistency In November 2020, S3 announced strong consistency for all read operations, and Iceberg is updated to fully leverage this feature. There is no redundant consistency wait and check which might negatively impact performance during IO operations.\nHadoop S3A FileSystem Before S3FileIO was introduced, many Iceberg users choose to use HadoopFileIO to write data to S3 through the S3A FileSystem. As introduced in the previous sections, S3FileIO adopts latest AWS clients and S3 features for optimized security and performance, and is thus recommend for S3 use cases rather than the S3A FileSystem.\nS3FileIO writes data with s3:// URI scheme, but it is also compatible with schemes written by the S3A FileSystem. This means for any table manifests containing s3a:// or s3n:// file paths, S3FileIO is still able to read them. This feature allows people to easily switch from S3A to S3FileIO.\nIf for any reason you have to use S3A, here are the instructions:\nTo store data using S3A, specify the warehouse catalog property to be an S3A path, e.g. s3a://my-bucket/my-warehouse For HiveCatalog, to also store metadata using S3A, specify the Hadoop config property hive.metastore.warehouse.dir to be an S3A path. Add hadoop-aws as a runtime dependency of your compute engine. Configure AWS settings based on hadoop-aws documentation (make sure you check the version, S3A configuration varies a lot based on the version you use). S3 Write Checksum Verification To ensure integrity of uploaded objects, checksum validations for S3 writes can be turned on by setting catalog property s3.checksum-enabled to true. This is turned off by default.\nS3 Tags Custom tags can be added to S3 objects while writing and deleting. For example, to write S3 tags with Spark 3.0, you can start the Spark SQL shell with:\nspark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \\ --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \\ --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \\ --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key1=my_val1 \\ --conf spark.sql.catalog.my_catalog.s3.write.tags.my_key2=my_val2 For the above example, the objects in S3 will be saved with tags: my_key1=my_val1 and my_key2=my_val2. Do note that the specified write tags will be saved only while object creation.\nWhen the catalog property s3.delete-enabled is set to false, the objects are not hard-deleted from S3. This is expected to be used in combination with S3 delete tagging, so objects are tagged and removed using S3 lifecycle policy. The property is set to true by default.\nWith the s3.delete.tags config, objects are tagged with the configured key-value pairs before deletion. Users can configure tag-based object lifecycle policy at bucket level to transition objects to different tiers. For example, to add S3 delete tags with Spark 3.0, you can start the Spark SQL shell with:\nsh spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.my_catalog.warehouse=s3://iceberg-warehouse/s3-tagging \\ --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \\ --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \\ --conf spark.sql.catalog.my_catalog.s3.delete.tags.my_key3=my_val3 \\ --conf spark.sql.catalog.my_catalog.s3.delete-enabled=false For the above example, the objects in S3 will be saved with tags: my_key3=my_val3 before deletion. Users can also use the catalog property s3.delete.num-threads to mention the number of threads to be used for adding delete tags to the S3 objects.\nFor more details on tag restrictions, please refer User-Defined Tag Restrictions.\nS3 Access Points Access Points can be used to perform S3 operations by specifying a mapping of bucket to access points. This is useful for multi-region access, cross-region access, disaster recovery, etc.\nFor using cross-region access points, we need to additionally set use-arn-region-enabled catalog property to true to enable S3FileIO to make cross-region calls, it\u2019s not required for same / multi-region access points.\nFor example, to use S3 access-point with Spark 3.0, you can start the Spark SQL shell with:\nspark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket2/my/key/prefix \\ --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \\ --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO \\ --conf spark.sql.catalog.my_catalog.s3.use-arn-region-enabled=false \\ --conf spark.sql.catalog.test.s3.access-points.my-bucket1=arn:aws:s3::123456789012:accesspoint:mfzwi23gnjvgw.mrap \\ --conf spark.sql.catalog.test.s3.access-points.my-bucket2=arn:aws:s3::123456789012:accesspoint:mfzwi23gnjvgw.mrap For the above example, the objects in S3 on my-bucket1 and my-bucket2 buckets will use arn:aws:s3::123456789012:accesspoint:mfzwi23gnjvgw.mrap access-point for all S3 operations.\nFor more details on using access-points, please refer Using access points with compatible Amazon S3 operations.\nAWS Client Customization Many organizations have customized their way of configuring AWS clients with their own credential provider, access proxy, retry strategy, etc. Iceberg allows users to plug in their own implementation of org.apache.iceberg.aws.AwsClientFactory by setting the client.factory catalog property.\nCross-Account and Cross-Region Access It is a common use case for organizations to have a centralized AWS account for Glue metastore and S3 buckets, and use different AWS accounts and regions for different teams to access those resources. In this case, a cross-account IAM role is needed to access those centralized resources. Iceberg provides an AWS client factory AssumeRoleAwsClientFactory to support this common use case. This also serves as an example for users who would like to implement their own AWS client factory.\nThis client factory has the following configurable catalog properties:\nProperty Default Description client.assume-role.arn null, requires user input ARN of the role to assume, e.g. arn:aws:iam::123456789:role/myRoleToAssume client.assume-role.region null, requires user input All AWS clients except the STS client will use the given region instead of the default region chain client.assume-role.external-id null An optional external ID client.assume-role.timeout-sec 1 hour Timeout of each assume role session. At the end of the timeout, a new set of role session credentials will be fetched through a STS client. By using this client factory, an STS client is initialized with the default credential and region to assume the specified role. The Glue, S3 and DynamoDB clients are then initialized with the assume-role credential and region to access resources. Here is an example to start Spark shell with this client factory:\nspark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.14.0,software.amazon.awssdk:bundle:2.17.131 \\ --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \\ --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \\ --conf spark.sql.catalog.my_catalog.client.factory=org.apache.iceberg.aws.AssumeRoleAwsClientFactory \\ --conf spark.sql.catalog.my_catalog.client.assume-role.arn=arn:aws:iam::123456789:role/myRoleToAssume \\ --conf spark.sql.catalog.my_catalog.client.assume-role.region=ap-northeast-1 Run Iceberg on AWS Amazon Athena Amazon Athena provides a serverless query engine that could be used to perform read, write, update and optimization tasks against Iceberg tables. More details could be found here.\nAmazon EMR Amazon EMR can provision clusters with Spark (EMR 6 for Spark 3, EMR 5 for Spark 2), Hive, Flink, Trino that can run Iceberg.\nStarting with EMR version 6.5.0, EMR clusters can be configured to have the necessary Apache Iceberg dependencies installed without requiring bootstrap actions. Please refer to the official documentation on how to create a cluster with Iceberg installed.\nFor versions before 6.5.0, you can use a bootstrap action similar to the following to pre-install all necessary dependencies:\n#!/bin/bash AWS_SDK_VERSION=2.17.131 ICEBERG_VERSION=0.14.0 MAVEN_URL=https://repo1.maven.org/maven2 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg AWS_MAVEN_URL=$MAVEN_URL/software/amazon/awssdk # NOTE: this is just an example shared class path between Spark and Flink, # please choose a proper class path for production. LIB_PATH=/usr/share/aws/aws-java-sdk/ AWS_PACKAGES=( \"bundle\" \"url-connection-client\" ) ICEBERG_PACKAGES=( \"iceberg-spark3-runtime\" \"iceberg-flink-runtime\" ) install_dependencies () { install_path=$1 download_url=$2 version=$3 shift pkgs=(\"$@\") for pkg in \"${pkgs[@]}\"; do sudo wget -P $install_path $download_url/$pkg/$version/$pkg-$version.jar done } install_dependencies $LIB_PATH $ICEBERG_MAVEN_URL $ICEBERG_VERSION \"${ICEBERG_PACKAGES[@]}\" install_dependencies $LIB_PATH $AWS_MAVEN_URL $AWS_SDK_VERSION \"${AWS_PACKAGES[@]}\" AWS EKS AWS Elastic Kubernetes Service (EKS) can be used to start any Spark, Flink, Hive, Presto or Trino clusters to work with Iceberg. Search the Iceberg blogs page for tutorials around running Iceberg with Docker and Kubernetes.\nAmazon Kinesis Amazon Kinesis Data Analytics provides a platform to run fully managed Apache Flink applications. You can include Iceberg in your application Jar and run it in the platform.\n", "description": "", "title": "AWS", "uri": "/docs/latest/aws/"}, {"categories": null, "content": " Configuration Table properties Iceberg tables support table properties to configure table behavior, like the default split size for readers.\nRead properties Property Default Description read.split.target-size 134217728 (128 MB) Target size when combining data input splits read.split.metadata-target-size 33554432 (32 MB) Target size when combining metadata input splits read.split.planning-lookback 10 Number of bins to consider when combining input splits read.split.open-file-cost 4194304 (4 MB) The estimated cost to open a file, used as a minimum weight when combining splits. read.parquet.vectorization.enabled false Enables parquet vectorized reads read.parquet.vectorization.batch-size 5000 The batch size for parquet vectorized reads read.orc.vectorization.enabled false Enables orc vectorized reads read.orc.vectorization.batch-size 5000 The batch size for orc vectorized reads Write properties Property Default Description write.format.default parquet Default file format for the table; parquet, avro, or orc write.delete.format.default data file format Default delete file format for the table; parquet, avro, or orc write.parquet.row-group-size-bytes 134217728 (128 MB) Parquet row group size write.parquet.page-size-bytes 1048576 (1 MB) Parquet page size write.parquet.dict-size-bytes 2097152 (2 MB) Parquet dictionary page size write.parquet.compression-codec gzip Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed write.parquet.compression-level null Parquet compression level write.parquet.bloom-filter-enabled.column.col1 (not set) Enables writing a bloom filter for the column: col1 write.parquet.bloom-filter-max-bytes 1048576 (1 MB) The maximum number of bytes for a bloom filter bitset write.avro.compression-codec gzip Avro compression codec: gzip(deflate with 9 level), zstd, snappy, uncompressed write.avro.compression-level null Avro compression level write.orc.stripe-size-bytes 67108864 (64 MB) Define the default ORC stripe size, in bytes write.orc.block-size-bytes 268435456 (256 MB) Define the default file system block size for ORC files write.orc.compression-codec zlib ORC compression codec: zstd, lz4, lzo, zlib, snappy, none write.orc.compression-strategy speed ORC compression strategy: speed, compression write.location-provider.impl null Optional custom implementation for LocationProvider write.metadata.compression-codec none Metadata compression codec; none or gzip write.metadata.metrics.default truncate(16) Default metrics mode for all columns in the table; none, counts, truncate(length), or full write.metadata.metrics.column.col1 (not set) Metrics mode for column \u2018col1\u2019 to allow per-column tuning; none, counts, truncate(length), or full write.target-file-size-bytes 536870912 (512 MB) Controls the size of files generated to target about this many bytes write.delete.target-file-size-bytes 67108864 (64 MB) Controls the size of delete files generated to target about this many bytes write.distribution-mode none Defines distribution of write data: none: don\u2019t shuffle rows; hash: hash distribute by partition key ; range: range distribute by partition key or sort key if table has an SortOrder write.delete.distribution-mode hash Defines distribution of write delete data write.wap.enabled false Enables write-audit-publish writes write.summary.partition-limit 0 Includes partition-level summary stats in snapshot summaries if the changed partition count is less than this limit write.metadata.delete-after-commit.enabled false Controls whether to delete the oldest version metadata files after commit write.metadata.previous-versions-max 100 The max number of previous version metadata files to keep before deleting after commit write.spark.fanout.enabled false Enables the fanout writer in Spark that does not require data to be clustered; uses more memory write.object-storage.enabled false Enables the object storage location provider that adds a hash component to file paths write.data.path table location + /data Base location for data files write.metadata.path table location + /metadata Base location for metadata files write.delete.mode copy-on-write Mode used for delete commands: copy-on-write or merge-on-read (v2 only) write.delete.isolation-level serializable Isolation level for delete commands: serializable or snapshot write.update.mode copy-on-write Mode used for update commands: copy-on-write or merge-on-read (v2 only) write.update.isolation-level serializable Isolation level for update commands: serializable or snapshot write.merge.mode copy-on-write Mode used for merge commands: copy-on-write or merge-on-read (v2 only) write.merge.isolation-level serializable Isolation level for merge commands: serializable or snapshot Table behavior properties Property Default Description commit.retry.num-retries 4 Number of times to retry a commit before failing commit.retry.min-wait-ms 100 Minimum time in milliseconds to wait before retrying a commit commit.retry.max-wait-ms 60000 (1 min) Maximum time in milliseconds to wait before retrying a commit commit.retry.total-timeout-ms 1800000 (30 min) Total retry timeout period in milliseconds for a commit commit.status-check.num-retries 3 Number of times to check whether a commit succeeded after a connection is lost before failing due to an unknown commit state commit.status-check.min-wait-ms 1000 (1s) Minimum time in milliseconds to wait before retrying a status-check commit.status-check.max-wait-ms 60000 (1 min) Maximum time in milliseconds to wait before retrying a status-check commit.status-check.total-timeout-ms 1800000 (30 min) Total timeout period in which the commit status-check must succeed, in milliseconds commit.manifest.target-size-bytes 8388608 (8 MB) Target size when merging manifest files commit.manifest.min-count-to-merge 100 Minimum number of manifests to accumulate before merging commit.manifest-merge.enabled true Controls whether to automatically merge manifests on writes history.expire.max-snapshot-age-ms 432000000 (5 days) Default max age of snapshots to keep while expiring snapshots history.expire.min-snapshots-to-keep 1 Default min number of snapshots to keep while expiring snapshots history.expire.max-ref-age-ms Long.MAX_VALUE (forever) For snapshot references except the main branch, default max age of snapshot references to keep while expiring snapshots. The main branch never expires. Reserved table properties Reserved table properties are only used to control behaviors when creating or updating a table. The value of these properties are not persisted as a part of the table metadata.\nProperty Default Description format-version 1 Table\u2019s format version (can be 1 or 2) as defined in the Spec. Compatibility flags Property Default Description compatibility.snapshot-id-inheritance.enabled false Enables committing snapshots without explicit snapshot IDs Catalog properties Iceberg catalogs support using catalog properties to configure catalog behaviors. Here is a list of commonly used catalog properties:\nProperty Default Description catalog-impl null a custom Catalog implementation to use by an engine io-impl null a custom FileIO implementation to use in a catalog warehouse null the root path of the data warehouse uri null a URI string, such as Hive metastore URI clients 2 client pool size cache-enabled true Whether to cache catalog entries cache.expiration-interval-ms 30000 How long catalog entries are locally cached, in milliseconds; 0 disables caching, negative values disable expiration HadoopCatalog and HiveCatalog can access the properties in their constructors. Any other custom catalog can access the properties by implementing Catalog.initialize(catalogName, catalogProperties). The properties can be manually constructed or passed in from a compute engine like Spark or Flink. Spark uses its session properties as catalog properties, see more details in the Spark configuration section. Flink passes in catalog properties through CREATE CATALOG statement, see more details in the Flink section.\nLock catalog properties Here are the catalog properties related to locking. They are used by some catalog implementations to control the locking behavior during commits.\nProperty Default Description lock-impl null a custom implementation of the lock manager, the actual interface depends on the catalog used lock.table null an auxiliary table for locking, such as in AWS DynamoDB lock manager lock.acquire-interval-ms 5 seconds the interval to wait between each attempt to acquire a lock lock.acquire-timeout-ms 3 minutes the maximum time to try acquiring a lock lock.heartbeat-interval-ms 3 seconds the interval to wait between each heartbeat after acquiring a lock lock.heartbeat-timeout-ms 15 seconds the maximum time without a heartbeat to consider a lock expired Hadoop configuration The following properties from the Hadoop configuration are used by the Hive Metastore connector.\nProperty Default Description iceberg.hive.client-pool-size 5 The size of the Hive client pool when tracking tables in HMS iceberg.hive.lock-timeout-ms 180000 (3 min) Maximum time in milliseconds to acquire a lock iceberg.hive.lock-check-min-wait-ms 50 Minimum time in milliseconds to check back on the status of lock acquisition iceberg.hive.lock-check-max-wait-ms 5000 Maximum time in milliseconds to check back on the status of lock acquisition Note: iceberg.hive.lock-check-max-wait-ms should be less than the transaction timeout of the Hive Metastore (hive.txn.timeout or metastore.txn.timeout in the newer versions). Otherwise, the heartbeats on the lock (which happens during the lock checks) would end up expiring in the Hive Metastore before the lock is retried from Iceberg.\n", "description": "", "title": "Configuration", "uri": "/docs/latest/configuration/"}, {"categories": null, "content": " Spark Configuration Catalogs Spark 3.0 adds an API to plug in table catalogs that are used to load, create, and manage Iceberg tables. Spark catalogs are configured by setting Spark properties under spark.sql.catalog.\nThis creates an Iceberg catalog named hive_prod that loads tables from a Hive metastore:\nspark.sql.catalog.hive_prod = org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.hive_prod.type = hive spark.sql.catalog.hive_prod.uri = thrift://metastore-host:port # omit uri to use the same URI as Spark: hive.metastore.uris in hive-site.xml Iceberg also supports a directory-based catalog in HDFS that can be configured using type=hadoop:\nspark.sql.catalog.hadoop_prod = org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.hadoop_prod.type = hadoop spark.sql.catalog.hadoop_prod.warehouse = hdfs://nn:8020/warehouse/path The Hive-based catalog only loads Iceberg tables. To load non-Iceberg tables in the same Hive metastore, use a session catalog. Catalog configuration A catalog is created and named by adding a property spark.sql.catalog.(catalog-name) with an implementation class for its value.\nIceberg supplies two implementations:\norg.apache.iceberg.spark.SparkCatalog supports a Hive Metastore or a Hadoop warehouse as a catalog org.apache.iceberg.spark.SparkSessionCatalog adds support for Iceberg tables to Spark\u2019s built-in catalog, and delegates to the built-in catalog for non-Iceberg tables Both catalogs are configured using properties nested under the catalog name. Common configuration properties for Hive and Hadoop are:\nProperty Values Description spark.sql.catalog.catalog-name.type hive or hadoop The underlying Iceberg catalog implementation, HiveCatalog, HadoopCatalog or left unset if using a custom catalog spark.sql.catalog.catalog-name.catalog-impl The underlying Iceberg catalog implementation. spark.sql.catalog.catalog-name.default-namespace default The default current namespace for the catalog spark.sql.catalog.catalog-name.uri thrift://host:port Metastore connect URI; default from hive-site.xml spark.sql.catalog.catalog-name.warehouse hdfs://nn:8020/warehouse/path Base path for the warehouse directory spark.sql.catalog.catalog-name.cache-enabled true or false Whether to enable catalog cache, default value is true spark.sql.catalog.catalog-name.cache.expiration-interval-ms 30000 (30 seconds) Duration after which cached catalog entries are expired; Only effective if cache-enabled is true. -1 disables cache expiration and 0 disables caching entirely, irrespective of cache-enabled. Default is 30000 (30 seconds) Additional properties can be found in common catalog configuration.\nUsing catalogs Catalog names are used in SQL queries to identify a table. In the examples above, hive_prod and hadoop_prod can be used to prefix database and table names that will be loaded from those catalogs.\nSELECT * FROM hive_prod.db.table -- load db.table from catalog hive_prod Spark 3 keeps track of the current catalog and namespace, which can be omitted from table names.\nUSE hive_prod.db; SELECT * FROM table -- load db.table from catalog hive_prod To see the current catalog and namespace, run SHOW CURRENT NAMESPACE.\nReplacing the session catalog To add Iceberg table support to Spark\u2019s built-in catalog, configure spark_catalog to use Iceberg\u2019s SparkSessionCatalog.\nspark.sql.catalog.spark_catalog = org.apache.iceberg.spark.SparkSessionCatalog spark.sql.catalog.spark_catalog.type = hive Spark\u2019s built-in catalog supports existing v1 and v2 tables tracked in a Hive Metastore. This configures Spark to use Iceberg\u2019s SparkSessionCatalog as a wrapper around that session catalog. When a table is not an Iceberg table, the built-in catalog will be used to load it instead.\nThis configuration can use same Hive Metastore for both Iceberg and non-Iceberg tables.\nUsing catalog specific Hadoop configuration values Similar to configuring Hadoop properties by using spark.hadoop.*, it\u2019s possible to set per-catalog Hadoop configuration values when using Spark by adding the property for the catalog with the prefix spark.sql.catalog.(catalog-name).hadoop.*. These properties will take precedence over values configured globally using spark.hadoop.* and will only affect Iceberg tables.\nspark.sql.catalog.hadoop_prod.hadoop.fs.s3a.endpoint = http://aws-local:9000 Loading a custom catalog Spark supports loading a custom Iceberg Catalog implementation by specifying the catalog-impl property. Here is an example:\nspark.sql.catalog.custom_prod = org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.custom_prod.catalog-impl = com.my.custom.CatalogImpl spark.sql.catalog.custom_prod.my-additional-catalog-config = my-value Catalogs in Spark 2.4 When using Iceberg 0.11.0 and later, Spark 2.4 can load tables from multiple Iceberg catalogs or from table locations.\nCatalogs in 2.4 are configured just like catalogs in 3.0, but only Iceberg catalogs are supported.\nSQL Extensions Iceberg 0.11.0 and later add an extension module to Spark to add new SQL commands, like CALL for stored procedures or ALTER TABLE ... WRITE ORDERED BY.\nUsing those SQL commands requires adding Iceberg extensions to your Spark environment using the following Spark property:\nSpark extensions property Iceberg extensions implementation spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions SQL extensions are not available for Spark 2.4.\nRuntime configuration Read options Spark read options are passed when configuring the DataFrameReader, like this:\n// time travel spark.read .option(\"snapshot-id\", 10963874102873L) .table(\"catalog.db.table\") Spark option Default Description snapshot-id (latest) Snapshot ID of the table snapshot to read as-of-timestamp (latest) A timestamp in milliseconds; the snapshot used will be the snapshot current at this time. split-size As per table property Overrides this table\u2019s read.split.target-size and read.split.metadata-target-size lookback As per table property Overrides this table\u2019s read.split.planning-lookback file-open-cost As per table property Overrides this table\u2019s read.split.open-file-cost vectorization-enabled As per table property Overrides this table\u2019s read.parquet.vectorization.enabled batch-size As per table property Overrides this table\u2019s read.parquet.vectorization.batch-size stream-from-timestamp (none) A timestamp in milliseconds to stream from; if before the oldest known ancestor snapshot, the oldest will be used Write options Spark write options are passed when configuring the DataFrameWriter, like this:\n// write with Avro instead of Parquet df.write .option(\"write-format\", \"avro\") .option(\"snapshot-property.key\", \"value\") .insertInto(\"catalog.db.table\") Spark option Default Description write-format Table write.format.default File format to use for this write operation; parquet, avro, or orc target-file-size-bytes As per table property Overrides this table\u2019s write.target-file-size-bytes check-nullability true Sets the nullable check on fields snapshot-property.custom-key null Adds an entry with custom-key and corresponding value in the snapshot summary fanout-enabled false Overrides this table\u2019s write.spark.fanout.enabled check-ordering true Checks if input schema and table schema are same isolation-level null Desired isolation level for Dataframe overwrite operations. null => no checks (for idempotent writes), serializable => check for concurrent inserts or deletes in destination partitions, snapshot => checks for concurrent deletes in destination partitions. validate-from-snapshot-id null If isolation level is set, id of base snapshot from which to check concurrent write conflicts into a table. Should be the snapshot before any reads from the table. Can be obtained via Table API or Snapshots table. If null, the table\u2019s oldest known snapshot is used. ", "description": "", "title": "Configuration", "uri": "/docs/latest/spark-configuration/"}, {"categories": null, "content": " Spark DDL To use Iceberg in Spark, first configure Spark catalogs.\nIceberg uses Apache Spark\u2019s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions. Spark 2.4 does not support SQL DDL.\nSpark 2.4 can\u2019t create Iceberg tables with DDL, instead use Spark 3.x or the Iceberg API. CREATE TABLE Spark 3.0 can create tables in any Iceberg catalog with the clause USING iceberg:\nCREATE TABLE prod.db.sample ( id bigint COMMENT 'unique id', data string) USING iceberg Iceberg will convert the column type in Spark to corresponding Iceberg type. Please check the section of type compatibility on creating table for details.\nTable create commands, including CTAS and RTAS, support the full range of Spark create clauses, including:\nPARTITIONED BY (partition-expressions) to configure partitioning LOCATION '(fully-qualified-uri)' to set the table location COMMENT 'table documentation' to set a table description TBLPROPERTIES ('key'='value', ...) to set table configuration Create commands may also set the default format with the USING clause. This is only supported for SparkCatalog because Spark handles the USING clause differently for the built-in catalog.\nPARTITIONED BY To create a partitioned table, use PARTITIONED BY:\nCREATE TABLE prod.db.sample ( id bigint, data string, category string) USING iceberg PARTITIONED BY (category) The PARTITIONED BY clause supports transform expressions to create hidden partitions.\nCREATE TABLE prod.db.sample ( id bigint, data string, category string, ts timestamp) USING iceberg PARTITIONED BY (bucket(16, id), days(ts), category) Supported transformations are:\nyears(ts): partition by year months(ts): partition by month days(ts) or date(ts): equivalent to dateint partitioning hours(ts) or date_hour(ts): equivalent to dateint and hour partitioning bucket(N, col): partition by hashed value mod N buckets truncate(L, col): partition by value truncated to L Strings are truncated to the given length Integers and longs truncate to bins: truncate(10, i) produces partitions 0, 10, 20, 30, \u2026 CREATE TABLE ... AS SELECT Iceberg supports CTAS as an atomic operation when using a SparkCatalog. CTAS is supported, but is not atomic when using SparkSessionCatalog.\nCREATE TABLE prod.db.sample USING iceberg AS SELECT ... REPLACE TABLE ... AS SELECT Iceberg supports RTAS as an atomic operation when using a SparkCatalog. RTAS is supported, but is not atomic when using SparkSessionCatalog.\nAtomic table replacement creates a new snapshot with the results of the SELECT query, but keeps table history.\nREPLACE TABLE prod.db.sample USING iceberg AS SELECT ... REPLACE TABLE prod.db.sample USING iceberg PARTITIONED BY (part) TBLPROPERTIES ('key'='value') AS SELECT ... CREATE OR REPLACE TABLE prod.db.sample USING iceberg AS SELECT ... The schema and partition spec will be replaced if changed. To avoid modifying the table\u2019s schema and partitioning, use INSERT OVERWRITE instead of REPLACE TABLE. The new table properties in the REPLACE TABLE command will be merged with any existing table properties. The existing table properties will be updated if changed else they are preserved.\nDROP TABLE To delete a table, run:\nDROP TABLE prod.db.sample ALTER TABLE Iceberg has full ALTER TABLE support in Spark 3, including:\nRenaming a table Setting or removing table properties Adding, deleting, and renaming columns Adding, deleting, and renaming nested fields Reordering top-level columns and nested struct fields Widening the type of int, float, and decimal fields Making required columns optional In addition, SQL extensions can be used to add support for partition evolution and setting a table\u2019s write order\nALTER TABLE ... RENAME TO ALTER TABLE prod.db.sample RENAME TO prod.db.new_name ALTER TABLE ... SET TBLPROPERTIES ALTER TABLE prod.db.sample SET TBLPROPERTIES ( 'read.split.target-size'='268435456' ) Iceberg uses table properties to control table behavior. For a list of available properties, see Table configuration.\nUNSET is used to remove properties:\nALTER TABLE prod.db.sample UNSET TBLPROPERTIES ('read.split.target-size') SET TBLPROPERTIES can also be used to set the table comment (description):\nALTER TABLE prod.db.sample SET TBLPROPERTIES ( 'comment' = 'A table comment.' ) ALTER TABLE ... ADD COLUMN To add a column to Iceberg, use the ADD COLUMNS clause with ALTER TABLE:\nALTER TABLE prod.db.sample ADD COLUMNS ( new_column string comment 'new_column docs' ) Multiple columns can be added at the same time, separated by commas.\nNested columns should be identified using the full column name:\n-- create a struct column ALTER TABLE prod.db.sample ADD COLUMN point struct<x: double, y: double>; -- add a field to the struct ALTER TABLE prod.db.sample ADD COLUMN point.z double -- create a nested array column of struct ALTER TABLE prod.db.sample ADD COLUMN points array<struct<x: double, y: double>>; -- add a field to the struct within an array. Using keyword 'element' to access the array's element column. ALTER TABLE prod.db.sample ADD COLUMN points.element.z double -- create a map column of struct key and struct value ALTER TABLE prod.db.sample ADD COLUMN points map<struct<x: int>, struct<a: int>>; -- add a field to the value struct in a map. Using keyword 'value' to access the map's value column. ALTER TABLE prod.db.sample ADD COLUMN points.value.b int Note: Altering a map \u2018key\u2019 column by adding columns is not allowed. Only map values can be updated.\nIn Spark 2.4.4 and later, you can add columns in any position by adding FIRST or AFTER clauses:\nALTER TABLE prod.db.sample ADD COLUMN new_column bigint AFTER other_column ALTER TABLE prod.db.sample ADD COLUMN nested.new_column bigint FIRST ALTER TABLE ... RENAME COLUMN Iceberg allows any field to be renamed. To rename a field, use RENAME COLUMN:\nALTER TABLE prod.db.sample RENAME COLUMN data TO payload ALTER TABLE prod.db.sample RENAME COLUMN location.lat TO latitude Note that nested rename commands only rename the leaf field. The above command renames location.lat to location.latitude\nALTER TABLE ... ALTER COLUMN Alter column is used to widen types, make a field optional, set comments, and reorder fields.\nIceberg allows updating column types if the update is safe. Safe updates are:\nint to bigint float to double decimal(P,S) to decimal(P2,S) when P2 > P (scale cannot change) ALTER TABLE prod.db.sample ALTER COLUMN measurement TYPE double To add or remove columns from a struct, use ADD COLUMN or DROP COLUMN with a nested column name.\nColumn comments can also be updated using ALTER COLUMN:\nALTER TABLE prod.db.sample ALTER COLUMN measurement TYPE double COMMENT 'unit is bytes per second' ALTER TABLE prod.db.sample ALTER COLUMN measurement COMMENT 'unit is kilobytes per second' Iceberg allows reordering top-level columns or columns in a struct using FIRST and AFTER clauses:\nALTER TABLE prod.db.sample ALTER COLUMN col FIRST ALTER TABLE prod.db.sample ALTER COLUMN nested.col AFTER other_col Nullability can be changed using SET NOT NULL and DROP NOT NULL:\nALTER TABLE prod.db.sample ALTER COLUMN id DROP NOT NULL ALTER COLUMN is not used to update struct types. Use ADD COLUMN and DROP COLUMN to add or remove struct fields. ALTER TABLE ... DROP COLUMN To drop columns, use ALTER TABLE ... DROP COLUMN:\nALTER TABLE prod.db.sample DROP COLUMN id ALTER TABLE prod.db.sample DROP COLUMN point.z ALTER TABLE SQL extensions These commands are available in Spark 3.x when using Iceberg SQL extensions.\nALTER TABLE ... ADD PARTITION FIELD Iceberg supports adding new partition fields to a spec using ADD PARTITION FIELD:\nALTER TABLE prod.db.sample ADD PARTITION FIELD catalog -- identity transform Partition transforms are also supported:\nALTER TABLE prod.db.sample ADD PARTITION FIELD bucket(16, id) ALTER TABLE prod.db.sample ADD PARTITION FIELD truncate(data, 4) ALTER TABLE prod.db.sample ADD PARTITION FIELD years(ts) -- use optional AS keyword to specify a custom name for the partition field ALTER TABLE prod.db.sample ADD PARTITION FIELD bucket(16, id) AS shard Adding a partition field is a metadata operation and does not change any of the existing table data. New data will be written with the new partitioning, but existing data will remain in the old partition layout. Old data files will have null values for the new partition fields in metadata tables.\nDynamic partition overwrite behavior will change when the table\u2019s partitioning changes because dynamic overwrite replaces partitions implicitly. To overwrite explicitly, use the new DataFrameWriterV2 API.\nTo migrate from daily to hourly partitioning with transforms, it is not necessary to drop the daily partition field. Keeping the field ensures existing metadata table queries continue to work. Dynamic partition overwrite behavior will change when partitioning changes For example, if you partition by days and move to partitioning by hours, overwrites will overwrite hourly partitions but not days anymore. ALTER TABLE ... DROP PARTITION FIELD Partition fields can be removed using DROP PARTITION FIELD:\nALTER TABLE prod.db.sample DROP PARTITION FIELD catalog ALTER TABLE prod.db.sample DROP PARTITION FIELD bucket(16, id) ALTER TABLE prod.db.sample DROP PARTITION FIELD truncate(data, 4) ALTER TABLE prod.db.sample DROP PARTITION FIELD years(ts) ALTER TABLE prod.db.sample DROP PARTITION FIELD shard Note that although the partition is removed, the column will still exist in the table schema.\nDropping a partition field is a metadata operation and does not change any of the existing table data. New data will be written with the new partitioning, but existing data will remain in the old partition layout.\nDynamic partition overwrite behavior will change when partitioning changes For example, if you partition by days and move to partitioning by hours, overwrites will overwrite hourly partitions but not days anymore. Be careful when dropping a partition field because it will change the schema of metadata tables, like files, and may cause metadata queries to fail or produce different results. ALTER TABLE ... WRITE ORDERED BY Iceberg tables can be configured with a sort order that is used to automatically sort data that is written to the table in some engines. For example, MERGE INTO in Spark will use the table ordering.\nTo set the write order for a table, use WRITE ORDERED BY:\nALTER TABLE prod.db.sample WRITE ORDERED BY category, id -- use optional ASC/DEC keyword to specify sort order of each field (default ASC) ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC, id DESC -- use optional NULLS FIRST/NULLS LAST keyword to specify null order of each field (default FIRST) ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC NULLS LAST, id DESC NULLS FIRST Table write order does not guarantee data order for queries. It only affects how data is written to the table. WRITE ORDERED BY sets a global ordering where rows are ordered across tasks, like using ORDER BY in an INSERT command:\nINSERT INTO prod.db.sample SELECT id, data, category, ts FROM another_table ORDER BY ts, category To order within each task, not across tasks, use LOCALLY ORDERED BY:\nALTER TABLE prod.db.sample WRITE LOCALLY ORDERED BY category, id ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION WRITE DISTRIBUTED BY PARTITION will request that each partition is handled by one writer, the default implementation is hash distribution.\nALTER TABLE prod.db.sample WRITE DISTRIBUTED BY PARTITION DISTRIBUTED BY PARTITION and LOCALLY ORDERED BY may be used together, to distribute by partition and locally order rows within each task.\nALTER TABLE prod.db.sample WRITE DISTRIBUTED BY PARTITION LOCALLY ORDERED BY category, id ", "description": "", "title": "DDL", "uri": "/docs/latest/spark-ddl/"}, {"categories": null, "content": " Flink Apache Iceberg supports both Apache Flink\u2019s DataStream API and Table API. See the Multi-Engine Support#apache-flink page for the integration of Apache Flink.\nFeature support Flink Notes SQL create catalog \u2714\ufe0f SQL create database \u2714\ufe0f SQL create table \u2714\ufe0f SQL create table like \u2714\ufe0f SQL alter table \u2714\ufe0f Only support altering table properties, column and partition changes are not supported SQL drop_table \u2714\ufe0f SQL select \u2714\ufe0f Support both streaming and batch mode SQL insert into \u2714\ufe0f \ufe0f Support both streaming and batch mode SQL insert overwrite \u2714\ufe0f \ufe0f DataStream read \u2714\ufe0f \ufe0f DataStream append \u2714\ufe0f \ufe0f DataStream overwrite \u2714\ufe0f \ufe0f Metadata tables \ufe0f Support Java API but does not support Flink SQL Rewrite files action \u2714\ufe0f \ufe0f Preparation when using Flink SQL Client To create iceberg table in flink, we recommend to use Flink SQL Client because it\u2019s easier for users to understand the concepts.\nStep.1 Downloading the flink 1.11.x binary package from the apache flink download page. We now use scala 2.12 to archive the apache iceberg-flink-runtime jar, so it\u2019s recommended to use flink 1.11 bundled with scala 2.12.\nFLINK_VERSION=1.11.1 SCALA_VERSION=2.12 APACHE_FLINK_URL=archive.apache.org/dist/flink/ wget ${APACHE_FLINK_URL}/flink-${FLINK_VERSION}/flink-${FLINK_VERSION}-bin-scala_${SCALA_VERSION}.tgz tar xzvf flink-${FLINK_VERSION}-bin-scala_${SCALA_VERSION}.tgz Step.2 Start a standalone flink cluster within hadoop environment.\n# HADOOP_HOME is your hadoop root directory after unpack the binary package. export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath` # Start the flink standalone cluster ./bin/start-cluster.sh Step.3 Start the flink SQL client.\nWe\u2019ve created a separate flink-runtime module in iceberg project to generate a bundled jar, which could be loaded by flink SQL client directly.\nIf we want to build the flink-runtime bundled jar manually, please just build the iceberg project and it will generate the jar under <iceberg-root-dir>/flink-runtime/build/libs. Of course, we could also download the flink-runtime jar from the apache official repository.\n# HADOOP_HOME is your hadoop root directory after unpack the binary package. export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath` ./bin/sql-client.sh embedded -j <flink-runtime-directory>/iceberg-flink-runtime-xxx.jar shell By default, iceberg has included hadoop jars for hadoop catalog. If we want to use hive catalog, we will need to load the hive jars when opening the flink sql client. Fortunately, apache flink has provided a bundled hive jar for sql client. So we could open the sql client as the following:\n# HADOOP_HOME is your hadoop root directory after unpack the binary package. export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath` # download Iceberg dependency ICEBERG_VERSION=0.11.1 MAVEN_URL=https://repo1.maven.org/maven2 ICEBERG_MAVEN_URL=${MAVEN_URL}/org/apache/iceberg ICEBERG_PACKAGE=iceberg-flink-runtime wget ${ICEBERG_MAVEN_URL}/${ICEBERG_PACKAGE}/${ICEBERG_VERSION}/${ICEBERG_PACKAGE}-${ICEBERG_VERSION}.jar # download the flink-sql-connector-hive-${HIVE_VERSION}_${SCALA_VERSION}-${FLINK_VERSION}.jar HIVE_VERSION=2.3.6 SCALA_VERSION=2.11 FLINK_VERSION=1.11.0 FLINK_CONNECTOR_URL=${MAVEN_URL}/org/apache/flink FLINK_CONNECTOR_PACKAGE=flink-sql-connector-hive wget ${FLINK_CONNECTOR_URL}/${FLINK_CONNECTOR_PACKAGE}-${HIVE_VERSION}_${SCALA_VERSION}/${FLINK_VERSION}/${FLINK_CONNECTOR_PACKAGE}-${HIVE_VERSION}_${SCALA_VERSION}-${FLINK_VERSION}.jar # open the SQL client. /path/to/bin/sql-client.sh embedded \\ -j ${ICEBERG_PACKAGE}-${ICEBERG_VERSION}.jar \\ -j ${FLINK_CONNECTOR_PACKAGE}-${HIVE_VERSION}_${SCALA_VERSION}-${FLINK_VERSION}.jar \\ shell Preparation when using Flink\u2019s Python API Install the Apache Flink dependency using pip\npip install apache-flink==1.11.1 In order for pyflink to function properly, it needs to have access to all Hadoop jars. For pyflink we need to copy those Hadoop jars to the installation directory of pyflink, which can be found under <PYTHON_ENV_INSTALL_DIR>/site-packages/pyflink/lib/ (see also a mention of this on the Flink ML). We can use the following short Python script to copy all Hadoop jars (you need to make sure that HADOOP_HOME points to your Hadoop installation):\nimport os import shutil import site def copy_all_hadoop_jars_to_pyflink(): if not os.getenv(\"HADOOP_HOME\"): raise Exception(\"The HADOOP_HOME env var must be set and point to a valid Hadoop installation\") jar_files = [] def find_pyflink_lib_dir(): for dir in site.getsitepackages(): package_dir = os.path.join(dir, \"pyflink\", \"lib\") if os.path.exists(package_dir): return package_dir return None for root, _, files in os.walk(os.getenv(\"HADOOP_HOME\")): for file in files: if file.endswith(\".jar\"): jar_files.append(os.path.join(root, file)) pyflink_lib_dir = find_pyflink_lib_dir() num_jar_files = len(jar_files) print(f\"Copying {num_jar_files} Hadoop jar files to pyflink's lib directory at {pyflink_lib_dir}\") for jar in jar_files: shutil.copy(jar, pyflink_lib_dir) if __name__ == '__main__': copy_all_hadoop_jars_to_pyflink() Once the script finished, you should see output similar to\nCopying 645 Hadoop jar files to pyflink's lib directory at <PYTHON_DIR>/lib/python3.8/site-packages/pyflink/lib Now we need to provide a file:// path to the iceberg-flink-runtime jar, which we can either get by building the project and looking at <iceberg-root-dir>/flink-runtime/build/libs, or downloading it from the Apache official repository. Third-party libs can be added to pyflink via env.add_jars(\"file:///my/jar/path/connector.jar\") / table_env.get_config().get_configuration().set_string(\"pipeline.jars\", \"file:///my/jar/path/connector.jar\"), which is also mentioned in the official docs. In our example we\u2019re using env.add_jars(..) as shown below:\nimport os from pyflink.datastream import StreamExecutionEnvironment env = StreamExecutionEnvironment.get_execution_environment() iceberg_flink_runtime_jar = os.path.join(os.getcwd(), \"iceberg-flink-runtime-0.14.0.jar\") env.add_jars(\"file://{}\".format(iceberg_flink_runtime_jar)) Once we reached this point, we can then create a StreamTableEnvironment and execute Flink SQL statements. The below example shows how to create a custom catalog via the Python Table API:\nfrom pyflink.table import StreamTableEnvironment table_env = StreamTableEnvironment.create(env) table_env.execute_sql(\"CREATE CATALOG my_catalog WITH (\" \"'type'='iceberg', \" \"'catalog-impl'='com.my.custom.CatalogImpl', \" \"'my-additional-catalog-config'='my-value')\") For more details, please refer to the Python Table API.\nCreating catalogs and using catalogs. Flink 1.11 support to create catalogs by using flink sql.\nCatalog Configuration A catalog is created and named by executing the following query (replace <catalog_name> with your catalog name and <config_key>=<config_value> with catalog implementation config):\nCREATE CATALOG <catalog_name> WITH ( 'type'='iceberg', `<config_key>`=`<config_value>` ); The following properties can be set globally and are not limited to a specific catalog implementation:\ntype: Must be iceberg. (required) catalog-type: hive or hadoop for built-in catalogs, or left unset for custom catalog implementations using catalog-impl. (Optional) catalog-impl: The fully-qualified class name custom catalog implementation, must be set if catalog-type is unset. (Optional) property-version: Version number to describe the property version. This property can be used for backwards compatibility in case the property format changes. The current property version is 1. (Optional) cache-enabled: Whether to enable catalog cache, default value is true Hive catalog This creates an iceberg catalog named hive_catalog that can be configured using 'catalog-type'='hive', which loads tables from a hive metastore:\nCREATE CATALOG hive_catalog WITH ( 'type'='iceberg', 'catalog-type'='hive', 'uri'='thrift://localhost:9083', 'clients'='5', 'property-version'='1', 'warehouse'='hdfs://nn:8020/warehouse/path' ); The following properties can be set if using the Hive catalog:\nuri: The Hive metastore\u2019s thrift URI. (Required) clients: The Hive metastore client pool size, default value is 2. (Optional) warehouse: The Hive warehouse location, users should specify this path if neither set the hive-conf-dir to specify a location containing a hive-site.xml configuration file nor add a correct hive-site.xml to classpath. hive-conf-dir: Path to a directory containing a hive-site.xml configuration file which will be used to provide custom Hive configuration values. The value of hive.metastore.warehouse.dir from <hive-conf-dir>/hive-site.xml (or hive configure file from classpath) will be overwrote with the warehouse value if setting both hive-conf-dir and warehouse when creating iceberg catalog. Hadoop catalog Iceberg also supports a directory-based catalog in HDFS that can be configured using 'catalog-type'='hadoop':\nCREATE CATALOG hadoop_catalog WITH ( 'type'='iceberg', 'catalog-type'='hadoop', 'warehouse'='hdfs://nn:8020/warehouse/path', 'property-version'='1' ); The following properties can be set if using the Hadoop catalog:\nwarehouse: The HDFS directory to store metadata files and data files. (Required) We could execute the sql command USE CATALOG hive_catalog to set the current catalog.\nCustom catalog Flink also supports loading a custom Iceberg Catalog implementation by specifying the catalog-impl property. Here is an example:\nCREATE CATALOG my_catalog WITH ( 'type'='iceberg', 'catalog-impl'='com.my.custom.CatalogImpl', 'my-additional-catalog-config'='my-value' ); Create through YAML config Catalogs can be registered in sql-client-defaults.yaml before starting the SQL client. Here is an example:\ncatalogs: - name: my_catalog type: iceberg catalog-type: hadoop warehouse: hdfs://nn:8020/warehouse/path Create through SQL Files Since the sql-client-defaults.yaml file was removed in flink 1.14, SQL Client supports the -i startup option to execute an initialization SQL file to setup environment when starting up the SQL Client. An example of such a file is presented below.\n-- define available catalogs CREATE CATALOG hive_catalog WITH ( 'type'='iceberg', 'catalog-type'='hive', 'uri'='thrift://localhost:9083', 'warehouse'='hdfs://nn:8020/warehouse/path' ); USE CATALOG hive_catalog; using -i <init.sql> option to initialize SQL Client session\n/path/to/bin/sql-client.sh -i /path/to/init.sql DDL commands CREATE DATABASE By default, iceberg will use the default database in flink. Using the following example to create a separate database if we don\u2019t want to create tables under the default database:\nCREATE DATABASE iceberg_db; USE iceberg_db; CREATE TABLE CREATE TABLE `hive_catalog`.`default`.`sample` ( id BIGINT COMMENT 'unique id', data STRING ); Table create commands support the most commonly used flink create clauses now, including:\nPARTITION BY (column1, column2, ...) to configure partitioning, apache flink does not yet support hidden partitioning. COMMENT 'table document' to set a table description. WITH ('key'='value', ...) to set table configuration which will be stored in apache iceberg table properties. Currently, it does not support computed column, primary key and watermark definition etc.\nPARTITIONED BY To create a partition table, use PARTITIONED BY:\nCREATE TABLE `hive_catalog`.`default`.`sample` ( id BIGINT COMMENT 'unique id', data STRING ) PARTITIONED BY (data); Apache Iceberg support hidden partition but apache flink don\u2019t support partitioning by a function on columns, so we\u2019ve no way to support hidden partition in flink DDL now, we will improve apache flink DDL in future.\nCREATE TABLE LIKE To create a table with the same schema, partitioning, and table properties as another table, use CREATE TABLE LIKE.\nCREATE TABLE `hive_catalog`.`default`.`sample` ( id BIGINT COMMENT 'unique id', data STRING ); CREATE TABLE `hive_catalog`.`default`.`sample_like` LIKE `hive_catalog`.`default`.`sample`; For more details, refer to the Flink CREATE TABLE documentation.\nALTER TABLE Iceberg only support altering table properties in flink 1.11 now.\nALTER TABLE `hive_catalog`.`default`.`sample` SET ('write.format.default'='avro') ALTER TABLE .. RENAME TO ALTER TABLE `hive_catalog`.`default`.`sample` RENAME TO `hive_catalog`.`default`.`new_sample`; DROP TABLE To delete a table, run:\nDROP TABLE `hive_catalog`.`default`.`sample`; Querying with SQL Iceberg support both streaming and batch read in flink now. we could execute the following sql command to switch the execute type from \u2018streaming\u2019 mode to \u2018batch\u2019 mode, and vice versa:\n-- Execute the flink job in streaming mode for current session context SET execution.runtime-mode = streaming; -- Execute the flink job in batch mode for current session context SET execution.runtime-mode = batch; Flink batch read If want to check all the rows in iceberg table by submitting a flink batch job, you could execute the following sentences:\n-- Execute the flink job in batch mode for current session context SET execution.runtime-mode = batch; SELECT * FROM sample ; Flink streaming read Iceberg supports processing incremental data in flink streaming jobs which starts from a historical snapshot-id:\n-- Submit the flink job in streaming mode for current session. SET execution.runtime-mode = streaming; -- Enable this switch because streaming read SQL will provide few job options in flink SQL hint options. SET table.dynamic-table-options.enabled=true; -- Read all the records from the iceberg current snapshot, and then read incremental data starting from that snapshot. SELECT * FROM sample /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/ ; -- Read all incremental data starting from the snapshot-id '3821550127947089987' (records from this snapshot will be excluded). SELECT * FROM sample /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s', 'start-snapshot-id'='3821550127947089987')*/ ; Those are the options that could be set in flink SQL hint options for streaming job:\nmonitor-interval: time interval for consecutively monitoring newly committed data files (default value: \u201910s\u2019). start-snapshot-id: the snapshot id that streaming job starts from. Writing with SQL Iceberg support both INSERT INTO and INSERT OVERWRITE in flink 1.11 now.\nINSERT INTO To append new data to a table with a flink streaming job, use INSERT INTO:\nINSERT INTO `hive_catalog`.`default`.`sample` VALUES (1, 'a'); INSERT INTO `hive_catalog`.`default`.`sample` SELECT id, data from other_kafka_table; INSERT OVERWRITE To replace data in the table with the result of a query, use INSERT OVERWRITE in batch job (flink streaming job does not support INSERT OVERWRITE). Overwrites are atomic operations for Iceberg tables.\nPartitions that have rows produced by the SELECT query will be replaced, for example:\nINSERT OVERWRITE sample VALUES (1, 'a'); Iceberg also support overwriting given partitions by the select values:\nINSERT OVERWRITE `hive_catalog`.`default`.`sample` PARTITION(data='a') SELECT 6; For a partitioned iceberg table, when all the partition columns are set a value in PARTITION clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in PARTITION clause, it is writing the query result into a dynamic partition. For an unpartitioned iceberg table, its data will be completely overwritten by INSERT OVERWRITE.\nReading with DataStream Iceberg support streaming or batch read in Java API now.\nBatch Read This example will read all records from iceberg table and then print to the stdout console in flink batch job:\nStreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(); TableLoader tableLoader = TableLoader.fromHadoopTable(\"hdfs://nn:8020/warehouse/path\"); DataStream<RowData> batch = FlinkSource.forRowData() .env(env) .tableLoader(tableLoader) .streaming(false) .build(); // Print all records to stdout. batch.print(); // Submit and execute this batch read job. env.execute(\"Test Iceberg Batch Read\"); Streaming read This example will read incremental records which start from snapshot-id \u20183821550127947089987\u2019 and print to stdout console in flink streaming job:\nStreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(); TableLoader tableLoader = TableLoader.fromHadoopTable(\"hdfs://nn:8020/warehouse/path\"); DataStream<RowData> stream = FlinkSource.forRowData() .env(env) .tableLoader(tableLoader) .streaming(true) .startSnapshotId(3821550127947089987L) .build(); // Print all records to stdout. stream.print(); // Submit and execute this streaming read job. env.execute(\"Test Iceberg Streaming Read\"); There are other options that we could set by Java API, please see the FlinkSource#Builder.\nWriting with DataStream Iceberg support writing to iceberg table from different DataStream input.\nAppending data. we have supported writing DataStream<RowData> and DataStream<Row> to the sink iceberg table natively.\nStreamExecutionEnvironment env = ...; DataStream<RowData> input = ... ; Configuration hadoopConf = new Configuration(); TableLoader tableLoader = TableLoader.fromHadoopTable(\"hdfs://nn:8020/warehouse/path\", hadoopConf); FlinkSink.forRowData(input) .tableLoader(tableLoader) .build(); env.execute(\"Test Iceberg DataStream\"); The iceberg API also allows users to write generic DataStream<T> to iceberg table, more example could be found in this unit test.\nOverwrite data To overwrite the data in existing iceberg table dynamically, we could set the overwrite flag in FlinkSink builder.\nStreamExecutionEnvironment env = ...; DataStream<RowData> input = ... ; Configuration hadoopConf = new Configuration(); TableLoader tableLoader = TableLoader.fromHadoopTable(\"hdfs://nn:8020/warehouse/path\", hadoopConf); FlinkSink.forRowData(input) .tableLoader(tableLoader) .overwrite(true) .build(); env.execute(\"Test Iceberg DataStream\"); Write options Flink write options are passed when configuring the FlinkSink, like this:\nFlinkSink.Builder builder = FlinkSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA) .table(table) .tableLoader(tableLoader) .set(\"write-format\", \"orc\") .set(FlinkWriteOptions.OVERWRITE_MODE, \"true\"); Flink option Default Description write-format Table write.format.default File format to use for this write operation; parquet, avro, or orc target-file-size-bytes As per table property Overrides this table\u2019s write.target-file-size-bytes upsert-enabled Table write.upsert.enabled Overrides this table\u2019s write.upsert.enabled overwrite-enabled false Overwrite the table\u2019s data, overwrite mode shouldn\u2019t be enable when configuring to use UPSERT data stream. distribution-mode Table write.distribution-mode Overrides this table\u2019s write.distribution-mode Inspecting tables. Iceberg does not support inspecting table in flink sql now, we need to use iceberg\u2019s Java API to read iceberg\u2019s meta data to get those table information.\nRewrite files action. Iceberg provides API to rewrite small files into large files by submitting flink batch job. The behavior of this flink action is the same as the spark\u2019s rewriteDataFiles.\nimport org.apache.iceberg.flink.actions.Actions; TableLoader tableLoader = TableLoader.fromHadoopTable(\"hdfs://nn:8020/warehouse/path\"); Table table = tableLoader.loadTable(); RewriteDataFilesActionResult result = Actions.forTable(table) .rewriteDataFiles() .execute(); For more doc about options of the rewrite files action, please see RewriteDataFilesAction\nType conversion Iceberg\u2019s integration for Flink automatically converts between Flink and Iceberg types. When writing to a table with types that are not supported by Flink, like UUID, Iceberg will accept and convert values from the Flink type.\nFlink to Iceberg Flink types are converted to Iceberg types according to the following table:\nFlink Iceberg Notes boolean boolean tinyint integer smallint integer integer integer bigint long float float double double char string varchar string string string binary binary varbinary fixed decimal decimal date date time time timestamp timestamp without timezone timestamp_ltz timestamp with timezone array list map map multiset map row struct raw Not supported interval Not supported structured Not supported timestamp with zone Not supported distinct Not supported null Not supported symbol Not supported logical Not supported Iceberg to Flink Iceberg types are converted to Flink types according to the following table:\nIceberg Flink boolean boolean struct row list array map map integer integer long bigint float float double double date date time time timestamp without timezone timestamp(6) timestamp with timezone timestamp_ltz(6) string varchar(2147483647) uuid binary(16) fixed(N) binary(N) binary varbinary(2147483647) decimal(P, S) decimal(P, S) Future improvement. There are some features that we do not yet support in the current flink iceberg integration work:\nDon\u2019t support creating iceberg table with hidden partitioning. Discussion in flink mail list. Don\u2019t support creating iceberg table with computed column. Don\u2019t support creating iceberg table with watermark. Don\u2019t support adding columns, removing columns, renaming columns, changing columns. FLINK-19062 is tracking this. ", "description": "", "title": "Enabling Iceberg in Flink", "uri": "/docs/latest/flink/"}, {"categories": null, "content": " Evolution Iceberg supports in-place table evolution. You can evolve a table schema just like SQL \u2013 even in nested structures \u2013 or change partition layout when data volume changes. Iceberg does not require costly distractions, like rewriting table data or migrating to a new table.\nFor example, Hive table partitioning cannot change so moving from a daily partition layout to an hourly partition layout requires a new table. And because queries are dependent on partitions, queries must be rewritten for the new table. In some cases, even changes as simple as renaming a column are either not supported, or can cause data correctness problems.\nSchema evolution Iceberg supports the following schema evolution changes:\nAdd \u2013 add a new column to the table or to a nested struct Drop \u2013 remove an existing column from the table or a nested struct Rename \u2013 rename an existing column or field in a nested struct Update \u2013 widen the type of a column, struct field, map key, map value, or list element Reorder \u2013 change the order of columns or fields in a nested struct Iceberg schema updates are metadata changes, so no data files need to be rewritten to perform the update.\nNote that map keys do not support adding or dropping struct fields that would change equality.\nCorrectness Iceberg guarantees that schema evolution changes are independent and free of side-effects, without rewriting files:\nAdded columns never read existing values from another column. Dropping a column or field does not change the values in any other column. Updating a column or field does not change values in any other column. Changing the order of columns or fields in a struct does not change the values associated with a column or field name. Iceberg uses unique IDs to track each column in a table. When you add a column, it is assigned a new ID so existing data is never used by mistake.\nFormats that track columns by name can inadvertently un-delete a column if a name is reused, which violates #1. Formats that track columns by position cannot delete columns without changing the names that are used for each column, which violates #2. Partition evolution Iceberg table partitioning can be updated in an existing table because queries do not reference partition values directly.\nWhen you evolve a partition spec, the old data written with an earlier spec remains unchanged. New data is written using the new spec in a new layout. Metadata for each of the partition versions is kept separately. Because of this, when you start writing queries, you get split planning. This is where each partition layout plans files separately using the filter it derives for that specific partition layout. Here\u2019s a visual representation of a contrived example:\nThe data for 2008 is partitioned by month. Starting from 2009 the table is updated so that the data is instead partitioned by day. Both partitioning layouts are able to coexist in the same table.\nIceberg uses hidden partitioning, so you don\u2019t need to write queries for a specific partition layout to be fast. Instead, you can write queries that select the data you need, and Iceberg automatically prunes out files that don\u2019t contain matching data.\nPartition evolution is a metadata operation and does not eagerly rewrite files.\nIceberg\u2019s Java table API provides updateSpec API to update partition spec. For example, the following code could be used to update the partition spec to add a new partition field that places id column values into 8 buckets and remove an existing partition field category:\nTable sampleTable = ...; sampleTable.updateSpec() .addField(bucket(\"id\", 8)) .removeField(\"category\") .commit(); Spark supports updating partition spec through its ALTER TABLE SQL statement, see more details in Spark SQL.\nSort order evolution Similar to partition spec, Iceberg sort order can also be updated in an existing table. When you evolve a sort order, the old data written with an earlier order remains unchanged. Engines can always choose to write data in the latest sort order or unsorted when sorting is prohibitively expensive.\nIceberg\u2019s Java table API provides replaceSortOrder API to update sort order. For example, the following code could be used to create a new sort order with id column sorted in ascending order with nulls last, and category column sorted in descending order with nulls first:\nTable sampleTable = ...; sampleTable.replaceSortOrder() .asc(\"id\", NullOrder.NULLS_LAST) .dec(\"category\", NullOrder.NULL_FIRST) .commit(); Spark supports updating sort order through its ALTER TABLE SQL statement, see more details in Spark SQL.\n", "description": "", "title": "Evolution", "uri": "/docs/latest/evolution/"}, {"categories": null, "content": " Flink Connector Apache Flink supports creating Iceberg table directly without creating the explicit Flink catalog in Flink SQL. That means we can just create an iceberg table by specifying 'connector'='iceberg' table option in Flink SQL which is similar to usage in the Flink official document.\nIn Flink, the SQL CREATE TABLE test (..) WITH ('connector'='iceberg', ...) will create a Flink table in current Flink catalog (use GenericInMemoryCatalog by default), which is just mapping to the underlying iceberg table instead of maintaining iceberg table directly in current Flink catalog.\nTo create the table in Flink SQL by using SQL syntax CREATE TABLE test (..) WITH ('connector'='iceberg', ...), Flink iceberg connector provides the following table properties:\nconnector: Use the constant iceberg. catalog-name: User-specified catalog name. It\u2019s required because the connector don\u2019t have any default value. catalog-type: Default to use hive if don\u2019t specify any value. The optional values are: hive: The Hive metastore catalog. hadoop: The hadoop catalog. custom: The customized catalog, see custom catalog for more details. catalog-database: The iceberg database name in the backend catalog, use the current flink database name by default. catalog-table: The iceberg table name in the backend catalog. Default to use the table name in the flink CREATE TABLE sentence. Table managed in Hive catalog. Before executing the following SQL, please make sure you\u2019ve configured the Flink SQL client correctly according to the quick start document.\nThe following SQL will create a Flink table in the current Flink catalog, which maps to the iceberg table default_database.iceberg_table managed in iceberg catalog.\nCREATE TABLE flink_table ( id BIGINT, data STRING ) WITH ( 'connector'='iceberg', 'catalog-name'='hive_prod', 'uri'='thrift://localhost:9083', 'warehouse'='hdfs://nn:8020/path/to/warehouse' ); If you want to create a Flink table mapping to a different iceberg table managed in Hive catalog (such as hive_db.hive_iceberg_table in Hive), then you can create Flink table as following:\nCREATE TABLE flink_table ( id BIGINT, data STRING ) WITH ( 'connector'='iceberg', 'catalog-name'='hive_prod', 'catalog-database'='hive_db', 'catalog-table'='hive_iceberg_table', 'uri'='thrift://localhost:9083', 'warehouse'='hdfs://nn:8020/path/to/warehouse' ); The underlying catalog database (hive_db in the above example) will be created automatically if it does not exist when writing records into the Flink table. Table managed in hadoop catalog The following SQL will create a Flink table in current Flink catalog, which maps to the iceberg table default_database.flink_table managed in hadoop catalog.\nCREATE TABLE flink_table ( id BIGINT, data STRING ) WITH ( 'connector'='iceberg', 'catalog-name'='hadoop_prod', 'catalog-type'='hadoop', 'warehouse'='hdfs://nn:8020/path/to/warehouse' ); Table managed in custom catalog The following SQL will create a Flink table in current Flink catalog, which maps to the iceberg table default_database.flink_table managed in custom catalog.\nCREATE TABLE flink_table ( id BIGINT, data STRING ) WITH ( 'connector'='iceberg', 'catalog-name'='custom_prod', 'catalog-type'='custom', 'catalog-impl'='com.my.custom.CatalogImpl', -- More table properties for the customized catalog 'my-additional-catalog-config'='my-value', ... ); Please check sections under the Integrations tab for all custom catalogs.\nA complete example. Take the Hive catalog as an example:\nCREATE TABLE flink_table ( id BIGINT, data STRING ) WITH ( 'connector'='iceberg', 'catalog-name'='hive_prod', 'uri'='thrift://localhost:9083', 'warehouse'='file:///path/to/warehouse' ); INSERT INTO flink_table VALUES (1, 'AAA'), (2, 'BBB'), (3, 'CCC'); SET execution.result-mode=tableau; SELECT * FROM flink_table; +----+------+ | id | data | +----+------+ | 1 | AAA | | 2 | BBB | | 3 | CCC | +----+------+ 3 rows in set For more details, please refer to the Iceberg Flink document.\n", "description": "", "title": "Flink Connector", "uri": "/docs/latest/flink-connector/"}, {"categories": null, "content": " Iceberg Java API Tables The main purpose of the Iceberg API is to manage table metadata, like schema, partition spec, metadata, and data files that store table data.\nTable metadata and operations are accessed through the Table interface. This interface will return table information.\nTable metadata The Table interface provides access to the table metadata:\nschema returns the current table schema spec returns the current table partition spec properties returns a map of key-value properties currentSnapshot returns the current table snapshot snapshots returns all valid snapshots for the table snapshot(id) returns a specific snapshot by ID location returns the table\u2019s base location Tables also provide refresh to update the table to the latest version, and expose helpers:\nio returns the FileIO used to read and write table files locationProvider returns a LocationProvider used to create paths for data and metadata files Scanning File level Iceberg table scans start by creating a TableScan object with newScan.\nTableScan scan = table.newScan(); To configure a scan, call filter and select on the TableScan to get a new TableScan with those changes.\nTableScan filteredScan = scan.filter(Expressions.equal(\"id\", 5)) Calls to configuration methods create a new TableScan so that each TableScan is immutable and won\u2019t change unexpectedly if shared across threads.\nWhen a scan is configured, planFiles, planTasks, and schema are used to return files, tasks, and the read projection.\nTableScan scan = table.newScan() .filter(Expressions.equal(\"id\", 5)) .select(\"id\", \"data\"); Schema projection = scan.schema(); Iterable<CombinedScanTask> tasks = scan.planTasks(); Use asOfTime or useSnapshot to configure the table snapshot for time travel queries.\nRow level Iceberg table scans start by creating a ScanBuilder object with IcebergGenerics.read.\nScanBuilder scanBuilder = IcebergGenerics.read(table) To configure a scan, call where and select on the ScanBuilder to get a new ScanBuilder with those changes.\nscanBuilder.where(Expressions.equal(\"id\", 5)) When a scan is configured, call method build to execute scan. build return CloseableIterable<Record>\nCloseableIterable<Record> result = IcebergGenerics.read(table) .where(Expressions.lessThan(\"id\", 5)) .build(); where Record is Iceberg record for iceberg-data module org.apache.iceberg.data.Record.\nUpdate operations Table also exposes operations that update the table. These operations use a builder pattern, PendingUpdate, that commits when PendingUpdate#commit is called.\nFor example, updating the table schema is done by calling updateSchema, adding updates to the builder, and finally calling commit to commit the pending changes to the table:\ntable.updateSchema() .addColumn(\"count\", Types.LongType.get()) .commit(); Available operations to update a table are:\nupdateSchema \u2013 update the table schema updateProperties \u2013 update table properties updateLocation \u2013 update the table\u2019s base location newAppend \u2013 used to append data files newFastAppend \u2013 used to append data files, will not compact metadata newOverwrite \u2013 used to append data files and remove files that are overwritten newDelete \u2013 used to delete data files newRewrite \u2013 used to rewrite data files; will replace existing files with new versions newTransaction \u2013 create a new table-level transaction rewriteManifests \u2013 rewrite manifest data by clustering files, for faster scan planning rollback \u2013 rollback the table state to a specific snapshot Transactions Transactions are used to commit multiple table changes in a single atomic operation. A transaction is used to create individual operations using factory methods, like newAppend, just like working with a Table. Operations created by a transaction are committed as a group when commitTransaction is called.\nFor example, deleting and appending a file in the same transaction:\nTransaction t = table.newTransaction(); // commit operations to the transaction t.newDelete().deleteFromRowFilter(filter).commit(); t.newAppend().appendFile(data).commit(); // commit all the changes to the table t.commitTransaction(); Types Iceberg data types are located in the org.apache.iceberg.types package.\nPrimitives Primitive type instances are available from static methods in each type class. Types without parameters use get, and types like decimal use factory methods:\nTypes.IntegerType.get() // int Types.DoubleType.get() // double Types.DecimalType.of(9, 2) // decimal(9, 2) Nested types Structs, maps, and lists are created using factory methods in type classes.\nLike struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track field IDs and nullability.\nStruct fields are created using NestedField.optional or NestedField.required. Map value and list element nullability is set in the map and list factory methods.\n// struct<1 id: int, 2 data: optional string> StructType struct = Struct.of( Types.NestedField.required(1, \"id\", Types.IntegerType.get()), Types.NestedField.optional(2, \"data\", Types.StringType.get()) ) // map<1 key: int, 2 value: optional string> MapType map = MapType.ofOptional( 1, 2, Types.IntegerType.get(), Types.StringType.get() ) // array<1 element: int> ListType list = ListType.ofRequired(1, IntegerType.get()); Expressions Iceberg\u2019s expressions are used to configure table scans. To create expressions, use the factory methods in Expressions.\nSupported predicate expressions are:\nisNull notNull equal notEqual lessThan lessThanOrEqual greaterThan greaterThanOrEqual in notIn startsWith notStartsWith Supported expression operations are:\nand or not Constant expressions are:\nalwaysTrue alwaysFalse Expression binding When created, expressions are unbound. Before an expression is used, it will be bound to a data type to find the field ID the expression name represents, and to convert predicate literals.\nFor example, before using the expression lessThan(\"x\", 10), Iceberg needs to determine which column \"x\" refers to and convert 10 to that column\u2019s data type.\nIf the expression could be bound to the type struct<1 x: long, 2 y: long> or to struct<11 x: int, 12 y: int>.\nExpression example table.newScan() .filter(Expressions.greaterThanOrEqual(\"x\", 5)) .filter(Expressions.lessThan(\"x\", 10)) Modules Iceberg table support is organized in library modules:\niceberg-common contains utility classes used in other modules iceberg-api contains the public Iceberg API, including expressions, types, tables, and operations iceberg-arrow is an implementation of the Iceberg type system for reading and writing data stored in Iceberg tables using Apache Arrow as the in-memory data format iceberg-aws contains implementations of the Iceberg API to be used with tables stored on AWS S3 and/or for tables defined using the AWS Glue data catalog iceberg-core contains implementations of the Iceberg API and support for Avro data files, this is what processing engines should depend on iceberg-parquet is an optional module for working with tables backed by Parquet files iceberg-orc is an optional module for working with tables backed by ORC files (experimental) iceberg-hive-metastore is an implementation of Iceberg tables backed by the Hive metastore Thrift client This project Iceberg also has modules for adding Iceberg support to processing engines and associated tooling:\niceberg-spark2 is an implementation of Spark\u2019s Datasource V2 API in 2.4 for Iceberg (use iceberg-spark-runtime for a shaded version) iceberg-spark3 is an implementation of Spark\u2019s Datasource V2 API in 3.0 for Iceberg (use iceberg-spark3-runtime for a shaded version) iceberg-spark-3.1 is an implementation of Spark\u2019s Datasource V2 API in 3.1 for Iceberg (use iceberg-spark-runtime-3.1 for a shaded version) iceberg-spark-3.2 is an implementation of Spark\u2019s Datasource V2 API in 3.2 for Iceberg (use iceberg-spark-runtime-3.2 for a shaded version) iceberg-flink is an implementation of Flink\u2019s Table and DataStream API for Iceberg (use iceberg-flink-runtime for a shaded version) iceberg-hive3 is an implementation of Hive 3 specific SerDe\u2019s for Timestamp, TimestampWithZone, and Date object inspectors (use iceberg-hive-runtime for a shaded version). iceberg-mr is an implementation of MapReduce and Hive InputFormats and SerDes for Iceberg (use iceberg-hive-runtime for a shaded version for use with Hive) iceberg-nessie is a module used to integrate Iceberg table metadata history and operations with Project Nessie iceberg-data is a client library used to read Iceberg tables from JVM applications iceberg-pig is an implementation of Pig\u2019s LoadFunc API for Iceberg iceberg-runtime generates a shaded runtime jar for Spark to integrate with iceberg tables ", "description": "", "title": "Java API", "uri": "/docs/latest/api/"}, {"categories": null, "content": " Custom Catalog Implementation It\u2019s possible to read an iceberg table either from an hdfs path or from a hive table. It\u2019s also possible to use a custom metastore in place of hive. The steps to do that are as follows.\nCustom TableOperations Custom Catalog Custom FileIO Custom LocationProvider Custom IcebergSource Custom table operations implementation Extend BaseMetastoreTableOperations to provide implementation on how to read and write metadata\nExample:\nclass CustomTableOperations extends BaseMetastoreTableOperations { private String dbName; private String tableName; private Configuration conf; private FileIO fileIO; protected CustomTableOperations(Configuration conf, String dbName, String tableName) { this.conf = conf; this.dbName = dbName; this.tableName = tableName; } // The doRefresh method should provide implementation on how to get the metadata location @Override public void doRefresh() { // Example custom service which returns the metadata location given a dbName and tableName String metadataLocation = CustomService.getMetadataForTable(conf, dbName, tableName); // When updating from a metadata file location, call the helper method refreshFromMetadataLocation(metadataLocation); } // The doCommit method should provide implementation on how to update with metadata location atomically @Override public void doCommit(TableMetadata base, TableMetadata metadata) { String oldMetadataLocation = base.location(); // Write new metadata using helper method String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1); // Example custom service which updates the metadata location for the given db and table atomically CustomService.updateMetadataLocation(dbName, tableName, oldMetadataLocation, newMetadataLocation); } // The io method provides a FileIO which is used to read and write the table metadata files @Override public FileIO io() { if (fileIO == null) { fileIO = new HadoopFileIO(conf); } return fileIO; } } A TableOperations instance is usually obtained by calling Catalog.newTableOps(TableIdentifier). See the next section about implementing and loading a custom catalog.\nCustom catalog implementation Extend BaseMetastoreCatalog to provide default warehouse locations and instantiate CustomTableOperations\nExample:\npublic class CustomCatalog extends BaseMetastoreCatalog { private Configuration configuration; // must have a no-arg constructor to be dynamically loaded // initialize(String name, Map<String, String> properties) will be called to complete initialization public CustomCatalog() { } public CustomCatalog(Configuration configuration) { this.configuration = configuration; } @Override protected TableOperations newTableOps(TableIdentifier tableIdentifier) { String dbName = tableIdentifier.namespace().level(0); String tableName = tableIdentifier.name(); // instantiate the CustomTableOperations return new CustomTableOperations(configuration, dbName, tableName); } @Override protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) { // Can choose to use any other configuration name String tableLocation = configuration.get(\"custom.iceberg.warehouse.location\"); // Can be an s3 or hdfs path if (tableLocation == null) { throw new RuntimeException(\"custom.iceberg.warehouse.location configuration not set!\"); } return String.format( \"%s/%s.db/%s\", tableLocation, tableIdentifier.namespace().levels()[0], tableIdentifier.name()); } @Override public boolean dropTable(TableIdentifier identifier, boolean purge) { // Example service to delete table CustomService.deleteTable(identifier.namepsace().level(0), identifier.name()); } @Override public void renameTable(TableIdentifier from, TableIdentifier to) { Preconditions.checkArgument(from.namespace().level(0).equals(to.namespace().level(0)), \"Cannot move table between databases\"); // Example service to rename table CustomService.renameTable(from.namepsace().level(0), from.name(), to.name()); } // implement this method to read catalog name and properties during initialization public void initialize(String name, Map<String, String> properties) { } } Catalog implementations can be dynamically loaded in most compute engines. For Spark and Flink, you can specify the catalog-impl catalog property to load it. Read the Configuration section for more details. For MapReduce, implement org.apache.iceberg.mr.CatalogLoader and set Hadoop property iceberg.mr.catalog.loader.class to load it. If your catalog must read Hadoop configuration to access certain environment properties, make your catalog implement org.apache.hadoop.conf.Configurable.\nCustom file IO implementation Extend FileIO and provide implementation to read and write data files\nExample:\npublic class CustomFileIO implements FileIO { // must have a no-arg constructor to be dynamically loaded // initialize(Map<String, String> properties) will be called to complete initialization public CustomFileIO() { } @Override public InputFile newInputFile(String s) { // you also need to implement the InputFile interface for a custom input file return new CustomInputFile(s); } @Override public OutputFile newOutputFile(String s) { // you also need to implement the OutputFile interface for a custom output file return new CustomOutputFile(s); } @Override public void deleteFile(String path) { Path toDelete = new Path(path); FileSystem fs = Util.getFs(toDelete); try { fs.delete(toDelete, false /* not recursive */); } catch (IOException e) { throw new RuntimeIOException(e, \"Failed to delete file: %s\", path); } } // implement this method to read catalog properties during initialization public void initialize(Map<String, String> properties) { } } If you are already implementing your own catalog, you can implement TableOperations.io() to use your custom FileIO. In addition, custom FileIO implementations can also be dynamically loaded in HadoopCatalog and HiveCatalog by specifying the io-impl catalog property. Read the Configuration section for more details. If your FileIO must read Hadoop configuration to access certain environment properties, make your FileIO implement org.apache.hadoop.conf.Configurable.\nCustom location provider implementation Extend LocationProvider and provide implementation to determine the file path to write data\nExample:\npublic class CustomLocationProvider implements LocationProvider { private String tableLocation; // must have a 2-arg constructor like this, or a no-arg constructor public CustomLocationProvider(String tableLocation, Map<String, String> properties) { this.tableLocation = tableLocation; } @Override public String newDataLocation(String filename) { // can use any custom method to generate a file path given a file name return String.format(\"%s/%s/%s\", tableLocation, UUID.randomUUID().toString(), filename); } @Override public String newDataLocation(PartitionSpec spec, StructLike partitionData, String filename) { // can use any custom method to generate a file path given a partition info and file name return newDataLocation(filename); } } If you are already implementing your own catalog, you can override TableOperations.locationProvider() to use your custom default LocationProvider. To use a different custom location provider for a specific table, specify the implementation when creating the table using table property write.location-provider.impl\nExample:\nCREATE TABLE hive.default.my_table ( id bigint, data string, category string) USING iceberg OPTIONS ( 'write.location-provider.impl'='com.my.CustomLocationProvider' ) PARTITIONED BY (category); Custom IcebergSource Extend IcebergSource and provide implementation to read from CustomCatalog\nExample:\npublic class CustomIcebergSource extends IcebergSource { @Override protected Table findTable(DataSourceOptions options, Configuration conf) { Optional<String> path = options.get(\"path\"); Preconditions.checkArgument(path.isPresent(), \"Cannot open table: path is not set\"); // Read table from CustomCatalog CustomCatalog catalog = new CustomCatalog(conf); TableIdentifier tableIdentifier = TableIdentifier.parse(path.get()); return catalog.loadTable(tableIdentifier); } } Register the CustomIcebergSource by updating META-INF/services/org.apache.spark.sql.sources.DataSourceRegister with its fully qualified name\n", "description": "", "title": "Java Custom Catalog", "uri": "/docs/latest/custom-catalog/"}, {"categories": null, "content": " Java API Quickstart Create a table Tables are created using either a Catalog or an implementation of the Tables interface.\nUsing a Hive catalog The Hive catalog connects to a Hive metastore to keep track of Iceberg tables. You can initialize a Hive catalog with a name and some properties. (see: Catalog properties)\nNote: Currently, setConf is always required for hive catalogs, but this will change in the future.\nimport org.apache.iceberg.hive.HiveCatalog; HiveCatalog catalog = new HiveCatalog(); catalog.setConf(spark.sparkContext().hadoopConfiguration()); // Configure using Spark's Hadoop configuration Map <String, String> properties = new HashMap<String, String>(); properties.put(\"warehouse\", \"...\"); properties.put(\"uri\", \"...\"); catalog.initialize(\"hive\", properties); The Catalog interface defines methods for working with tables, like createTable, loadTable, renameTable, and dropTable. HiveCatalog implements the Catalog interface.\nTo create a table, pass an Identifier and a Schema along with other initial metadata:\nimport org.apache.iceberg.Table; import org.apache.iceberg.catalog.TableIdentifier; TableIdentifier name = TableIdentifier.of(\"logging\", \"logs\"); Table table = catalog.createTable(name, schema, spec); // or to load an existing table, use the following line // Table table = catalog.loadTable(name); The logs schema and partition spec are created below.\nUsing a Hadoop catalog A Hadoop catalog doesn\u2019t need to connect to a Hive MetaStore, but can only be used with HDFS or similar file systems that support atomic rename. Concurrent writes with a Hadoop catalog are not safe with a local FS or S3. To create a Hadoop catalog:\nimport org.apache.hadoop.conf.Configuration; import org.apache.iceberg.hadoop.HadoopCatalog; Configuration conf = new Configuration(); String warehousePath = \"hdfs://host:8020/warehouse_path\"; HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath); Like the Hive catalog, HadoopCatalog implements Catalog, so it also has methods for working with tables, like createTable, loadTable, and dropTable.\nThis example creates a table with Hadoop catalog:\nimport org.apache.iceberg.Table; import org.apache.iceberg.catalog.TableIdentifier; TableIdentifier name = TableIdentifier.of(\"logging\", \"logs\"); Table table = catalog.createTable(name, schema, spec); // or to load an existing table, use the following line // Table table = catalog.loadTable(name); The logs schema and partition spec are created below.\nUsing Hadoop tables Iceberg also supports tables that are stored in a directory in HDFS. Concurrent writes with a Hadoop tables are not safe when stored in the local FS or S3. Directory tables don\u2019t support all catalog operations, like rename, so they use the Tables interface instead of Catalog.\nTo create a table in HDFS, use HadoopTables:\nimport org.apache.hadoop.conf.Configuration; import org.apache.iceberg.hadoop.HadoopTables; import org.apache.iceberg.Table; Configuration conf = new Configuration(); HadoopTables tables = new HadoopTables(conf); Table table = tables.create(schema, spec, table_location); // or to load an existing table, use the following line // Table table = tables.load(table_location); Hadoop tables shouldn\u2019t be used with file systems that do not support atomic rename. Iceberg relies on rename to synchronize concurrent commits for directory tables. Tables in Spark Spark uses both HiveCatalog and HadoopTables to load tables. Hive is used when the identifier passed to load or save is not a path, otherwise Spark assumes it is a path-based table.\nTo read and write to tables from Spark see:\nSQL queries in Spark INSERT INTO in Spark MERGE INTO in Spark Schemas Create a schema This example creates a schema for a logs table:\nimport org.apache.iceberg.Schema; import org.apache.iceberg.types.Types; Schema schema = new Schema( Types.NestedField.required(1, \"level\", Types.StringType.get()), Types.NestedField.required(2, \"event_time\", Types.TimestampType.withZone()), Types.NestedField.required(3, \"message\", Types.StringType.get()), Types.NestedField.optional(4, \"call_stack\", Types.ListType.ofRequired(5, Types.StringType.get())) ); When using the Iceberg API directly, type IDs are required. Conversions from other schema formats, like Spark, Avro, and Parquet will automatically assign new IDs.\nWhen a table is created, all IDs in the schema are re-assigned to ensure uniqueness.\nConvert a schema from Avro To create an Iceberg schema from an existing Avro schema, use converters in AvroSchemaUtil:\nimport org.apache.avro.Schema; import org.apache.avro.Schema.Parser; import org.apache.iceberg.avro.AvroSchemaUtil; Schema avroSchema = new Parser().parse(\"{\\\"type\\\": \\\"record\\\" , ... }\"); Schema icebergSchema = AvroSchemaUtil.toIceberg(avroSchema); Convert a schema from Spark To create an Iceberg schema from an existing table, use converters in SparkSchemaUtil:\nimport org.apache.iceberg.spark.SparkSchemaUtil; Schema schema = SparkSchemaUtil.schemaForTable(sparkSession, table_name); Partitioning Create a partition spec Partition specs describe how Iceberg should group records into data files. Partition specs are created for a table\u2019s schema using a builder.\nThis example creates a partition spec for the logs table that partitions records by the hour of the log event\u2019s timestamp and by log level:\nimport org.apache.iceberg.PartitionSpec; PartitionSpec spec = PartitionSpec.builderFor(schema) .hour(\"event_time\") .identity(\"level\") .build(); For more information on the different partition transforms that Iceberg offers, visit this page.\n", "description": "", "title": "Java Quickstart", "uri": "/docs/latest/java-api-quickstart/"}, {"categories": null, "content": " Iceberg JDBC Integration JDBC Catalog Iceberg supports using a table in a relational database to manage Iceberg tables through JDBC. The database that JDBC connects to must support atomic transaction to allow the JDBC catalog implementation to properly support atomic Iceberg table commits and read serializable isolation.\nConfigurations Because each database and database service provider might require different configurations, the JDBC catalog allows arbitrary configurations through:\nProperty Default Description uri the JDBC connection string jdbc.<property_key> any key value pairs to configure the JDBC connection Examples Spark You can start a Spark session with a MySQL JDBC connection using the following configurations:\nspark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0 \\ --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \\ --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.jdbc.JdbcCatalog \\ --conf spark.sql.catalog.my_catalog.uri=jdbc:mysql://test.1234567890.us-west-2.rds.amazonaws.com:3306/default \\ --conf spark.sql.catalog.my_catalog.jdbc.verifyServerCertificate=true \\ --conf spark.sql.catalog.my_catalog.jdbc.useSSL=true \\ --conf spark.sql.catalog.my_catalog.jdbc.user=admin \\ --conf spark.sql.catalog.my_catalog.jdbc.password=pass Java API Class.forName(\"com.mysql.cj.jdbc.Driver\"); // ensure JDBC driver is at runtime classpath Map<String, String> properties = new HashMap<>(); properties.put(CatalogProperties.CATALOG_IMPL, JdbcCatalog.class.getName()); properties.put(CatalogProperties.URI, \"jdbc:mysql://localhost:3306/test\"); properties.put(JdbcCatalog.PROPERTY_PREFIX + \"user\", \"admin\"); properties.put(JdbcCatalog.PROPERTY_PREFIX + \"password\", \"pass\"); properties.put(CatalogProperties.WAREHOUSE_LOCATION, \"s3://warehouse/path\"); Configuration hadoopConf = new Configuration(); // configs if you use HadoopFileIO JdbcCatalog catalog = CatalogUtil.buildIcebergCatalog(\"test_jdbc_catalog\", properties, hadoopConf); ", "description": "", "title": "JDBC", "uri": "/docs/latest/jdbc/"}, {"categories": null, "content": " Maintenance Maintenance operations require the Table instance. Please refer Java API quickstart page to refer how to load an existing table. Recommended Maintenance Expire Snapshots Each write to an Iceberg table creates a new snapshot, or version, of a table. Snapshots can be used for time-travel queries, or the table can be rolled back to any valid snapshot.\nSnapshots accumulate until they are expired by the expireSnapshots operation. Regularly expiring snapshots is recommended to delete data files that are no longer needed, and to keep the size of table metadata small.\nThis example expires snapshots that are older than 1 day:\nTable table = ... long tsToExpire = System.currentTimeMillis() - (1000 * 60 * 60 * 24); // 1 day table.expireSnapshots() .expireOlderThan(tsToExpire) .commit(); See the ExpireSnapshots Javadoc to see more configuration options.\nThere is also a Spark action that can run table expiration in parallel for large tables:\nTable table = ... SparkActions .get() .expireSnapshots(table) .expireOlderThan(tsToExpire) .execute(); Expiring old snapshots removes them from metadata, so they are no longer available for time travel queries.\nData files are not deleted until they are no longer referenced by a snapshot that may be used for time travel or rollback. Regularly expiring snapshots deletes unused data files. Remove old metadata files Iceberg keeps track of table metadata using JSON files. Each change to a table produces a new metadata file to provide atomicity.\nOld metadata files are kept for history by default. Tables with frequent commits, like those written by streaming jobs, may need to regularly clean metadata files.\nTo automatically clean metadata files, set write.metadata.delete-after-commit.enabled=true in table properties. This will keep some metadata files (up to write.metadata.previous-versions-max) and will delete the oldest metadata file after each new one is created.\nProperty Description write.metadata.delete-after-commit.enabled Whether to delete old metadata files after each table commit write.metadata.previous-versions-max The number of old metadata files to keep See table write properties for more details.\nDelete orphan files In Spark and other distributed processing engines, task or job failures can leave files that are not referenced by table metadata, and in some cases normal snapshot expiration may not be able to determine a file is no longer needed and delete it.\nTo clean up these \u201corphan\u201d files under a table location, use the deleteOrphanFiles action.\nTable table = ... SparkActions .get() .deleteOrphanFiles(table) .execute(); See the DeleteOrphanFiles Javadoc to see more configuration options.\nThis action may take a long time to finish if you have lots of files in data and metadata directories. It is recommended to execute this periodically, but you may not need to execute this often.\nIt is dangerous to remove orphan files with a retention interval shorter than the time expected for any write to complete because it might corrupt the table if in-progress files are considered orphaned and are deleted. The default interval is 3 days. Iceberg uses the string representations of paths when determining which files need to be removed. On some file systems, the path can change over time, but it still represents the same file. For example, if you change authorities for an HDFS cluster, none of the old path urls used during creation will match those that appear in a current listing. This will lead to data loss when RemoveOrphanFiles is run. Please be sure the entries in your MetadataTables match those listed by the Hadoop FileSystem API to avoid unintentional deletion. Optional Maintenance Some tables require additional maintenance. For example, streaming queries may produce small data files that should be compacted into larger files. And some tables can benefit from rewriting manifest files to make locating data for queries much faster.\nCompact data files Iceberg tracks each data file in a table. More data files leads to more metadata stored in manifest files, and small data files causes an unnecessary amount of metadata and less efficient queries from file open costs.\nIceberg can compact data files in parallel using Spark with the rewriteDataFiles action. This will combine small files into larger files to reduce metadata overhead and runtime file open cost.\nTable table = ... SparkActions .get() .rewriteDataFiles(table) .filter(Expressions.equal(\"date\", \"2020-08-18\")) .option(\"target-file-size-bytes\", Long.toString(500 * 1024 * 1024)) // 500 MB .execute(); The files metadata table is useful for inspecting data file sizes and determining when to compact partitions.\nSee the RewriteDataFiles Javadoc to see more configuration options.\nRewrite manifests Iceberg uses metadata in its manifest list and manifest files speed up query planning and to prune unnecessary data files. The metadata tree functions as an index over a table\u2019s data.\nManifests in the metadata tree are automatically compacted in the order they are added, which makes queries faster when the write pattern aligns with read filters. For example, writing hourly-partitioned data as it arrives is aligned with time range query filters.\nWhen a table\u2019s write pattern doesn\u2019t align with the query pattern, metadata can be rewritten to re-group data files into manifests using rewriteManifests or the rewriteManifests action (for parallel rewrites using Spark).\nThis example rewrites small manifests and groups data files by the first partition field.\nTable table = ... SparkActions .get() .rewriteManifests(table) .rewriteIf(file -> file.length() < 10 * 1024 * 1024) // 10 MB .execute(); See the RewriteManifests Javadoc to see more configuration options.\n", "description": "", "title": "Maintenance", "uri": "/docs/latest/maintenance/"}, {"categories": null, "content": " Iceberg Nessie Integration Iceberg provides integration with Nessie through the iceberg-nessie module. This section describes how to use Iceberg with Nessie. Nessie provides several key features on top of Iceberg:\nmulti-table transactions git-like operations (eg branches, tags, commits) hive-like metastore capabilities See Project Nessie for more information on Nessie. Nessie requires a server to run, see Getting Started to start a Nessie server.\nEnabling Nessie Catalog The iceberg-nessie module is bundled with Spark and Flink runtimes for all versions from 0.11.0. To get started with Nessie and Iceberg simply add the Iceberg runtime to your process. Eg: spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.14.0.\nSpark SQL Extensions From Spark 3.0, Nessie SQL extensions can be used to manage the Nessie repo as shown below.\nbin/spark-sql --packages \"org.apache.iceberg:iceberg-spark3-runtime:0.14.0,org.projectnessie:nessie-spark-extensions:0.20.0\" --conf spark.sql.extensions=\"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions\" --conf <other settings> Please refer Nessie SQL extension document to learn more about it.\nNessie Catalog One major feature introduced in release 0.11.0 is the ability to easily interact with a Custom Catalog from Spark and Flink. See Spark Configuration and Flink Configuration for instructions for adding a custom catalog to Iceberg.\nTo use the Nessie Catalog the following properties are required:\nwarehouse. Like most other catalogs the warehouse property is a file path to where this catalog should store tables. uri. This is the Nessie server base uri. Eg http://localhost:19120/api/v1. ref (optional). This is the Nessie branch or tag you want to work in. To run directly in Java this looks like:\nMap<String, String> options = new HashMap<>(); options.put(\"warehouse\", \"/path/to/warehouse\"); options.put(\"ref\", \"main\"); options.put(\"uri\", \"https://localhost:19120/api/v1\"); Catalog nessieCatalog = CatalogUtil.loadCatalog(\"org.apache.iceberg.nessie.NessieCatalog\", \"nessie\", options, hadoopConfig); and in Spark:\nconf.set(\"spark.sql.catalog.nessie.warehouse\", \"/path/to/warehouse\"); conf.set(\"spark.sql.catalog.nessie.uri\", \"http://localhost:19120/api/v1\") conf.set(\"spark.sql.catalog.nessie.ref\", \"main\") conf.set(\"spark.sql.catalog.nessie.catalog-impl\", \"org.apache.iceberg.nessie.NessieCatalog\") conf.set(\"spark.sql.catalog.nessie\", \"org.apache.iceberg.spark.SparkCatalog\") conf.set(\"spark.sql.extensions\", \"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions\") This is how it looks in Flink via the Python API (additional details can be found here):\nimport os from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment env = StreamExecutionEnvironment.get_execution_environment() iceberg_flink_runtime_jar = os.path.join(os.getcwd(), \"iceberg-flink-runtime-0.14.0.jar\") env.add_jars(\"file://{}\".format(iceberg_flink_runtime_jar)) table_env = StreamTableEnvironment.create(env) table_env.execute_sql(\"CREATE CATALOG nessie_catalog WITH (\" \"'type'='iceberg', \" \"'catalog-impl'='org.apache.iceberg.nessie.NessieCatalog', \" \"'uri'='http://localhost:19120/api/v1', \" \"'ref'='main', \" \"'warehouse'='/path/to/warehouse')\") There is nothing special above about the nessie name. A spark catalog can have any name, the important parts are the settings for the catalog-impl and the required config to start Nessie correctly. Once you have a Nessie catalog you have access to your entire Nessie repo. You can then perform create/delete/merge operations on branches and perform commits on branches. Each Iceberg table in a Nessie Catalog is identified by an arbitrary length namespace and table name (eg data.base.name.table). These namespaces are implicit and don\u2019t need to be created separately. Any transaction on a Nessie enabled Iceberg table is a single commit in Nessie. Nessie commits can encompass an arbitrary number of actions on an arbitrary number of tables, however in Iceberg this will be limited to the set of single table transactions currently available.\nFurther operations such as merges, viewing the commit log or diffs are performed by direct interaction with the NessieClient in java or by using the python client or cli. See Nessie CLI for more details on the CLI and Spark Guide for a more complete description of Nessie functionality.\nNessie and Iceberg For most cases Nessie acts just like any other Catalog for Iceberg: providing a logical organization of a set of tables and providing atomicity to transactions. However, using Nessie opens up other interesting possibilities. When using Nessie with Iceberg every Iceberg transaction becomes a Nessie commit. This history can be listed, merged or cherry-picked across branches.\nLoosely coupled transactions By creating a branch and performing a set of operations on that branch you can approximate a multi-table transaction. A sequence of commits can be performed on the newly created branch and then merged back into the main branch atomically. This gives the appearance of a series of connected changes being exposed to the main branch simultaneously. While downstream consumers will see multiple transactions appear at once this isn\u2019t a true multi-table transaction on the database. It is effectively a fast-forward merge of multiple commits (in git language) and each operation from the branch is its own distinct transaction and commit. This is different from a real multi-table transaction where all changes would be in the same commit. This does allow multiple applications to take part in modifying a branch and for this distributed set of transactions to be exposed to the downstream users simultaneously.\nExperimentation Changes to a table can be tested in a branch before merging back into main. This is particularly useful when performing large changes like schema evolution or partition evolution. A partition evolution could be performed in a branch and you would be able to test out the change (eg performance benchmarks) before merging it. This provides great flexibility in performing on-line table modifications and testing without interrupting downstream use cases. If the changes are incorrect or not performant the branch can be dropped without being merged.\nFurther use cases Please see the Nessie Documentation for further descriptions of Nessie features.\nRegular table maintenance in Iceberg is complicated when using nessie. Please consult Management Services before performing any table maintenance. Example Please have a look at the Nessie Demos repo for different examples of Nessie and Iceberg in action together.\nFuture Improvements Iceberg multi-table transactions. Changes to multiple Iceberg tables in the same transaction, isolation levels etc ", "description": "", "title": "Nessie", "uri": "/docs/latest/nessie/"}, {"categories": null, "content": " Partitioning What is partitioning? Partitioning is a way to make queries faster by grouping similar rows together when writing.\nFor example, queries for log entries from a logs table would usually include a time range, like this query for logs between 10 and 12 AM:\nSELECT level, message FROM logs WHERE event_time BETWEEN '2018-12-01 10:00:00' AND '2018-12-01 12:00:00' Configuring the logs table to partition by the date of event_time will group log events into files with the same event date. Iceberg keeps track of that date and will use it to skip files for other dates that don\u2019t have useful data.\nIceberg can partition timestamps by year, month, day, and hour granularity. It can also use a categorical column, like level in this logs example, to store rows together and speed up queries.\nWhat does Iceberg do differently? Other tables formats like Hive support partitioning, but Iceberg supports hidden partitioning.\nIceberg handles the tedious and error-prone task of producing partition values for rows in a table. Iceberg avoids reading unnecessary partitions automatically. Consumers don\u2019t need to know how the table is partitioned and add extra filters to their queries. Iceberg partition layouts can evolve as needed. Partitioning in Hive To demonstrate the difference, consider how Hive would handle a logs table.\nIn Hive, partitions are explicit and appear as a column, so the logs table would have a column called event_date. When writing, an insert needs to supply the data for the event_date column:\nINSERT INTO logs PARTITION (event_date) SELECT level, message, event_time, format_time(event_time, 'YYYY-MM-dd') FROM unstructured_log_source Similarly, queries that search through the logs table must have an event_date filter in addition to an event_time filter.\nSELECT level, count(1) as count FROM logs WHERE event_time BETWEEN '2018-12-01 10:00:00' AND '2018-12-01 12:00:00' AND event_date = '2018-12-01' If the event_date filter were missing, Hive would scan through every file in the table because it doesn\u2019t know that the event_time column is related to the event_date column.\nProblems with Hive partitioning Hive must be given partition values. In the logs example, it doesn\u2019t know the relationship between event_time and event_date.\nThis leads to several problems:\nHive can\u2019t validate partition values \u2013 it is up to the writer to produce the correct value Using the wrong format, 2018-12-01 instead of 20181201, produces silently incorrect results, not query failures Using the wrong source column, like processing_time, or time zone also causes incorrect results, not failures It is up to the user to write queries correctly Using the wrong format also leads to silently incorrect results Users that don\u2019t understand a table\u2019s physical layout get needlessly slow queries \u2013 Hive can\u2019t translate filters automatically Working queries are tied to the table\u2019s partitioning scheme, so partitioning configuration cannot be changed without breaking queries Iceberg\u2019s hidden partitioning Iceberg produces partition values by taking a column value and optionally transforming it. Iceberg is responsible for converting event_time into event_date, and keeps track of the relationship.\nTable partitioning is configured using these relationships. The logs table would be partitioned by date(event_time) and level.\nBecause Iceberg doesn\u2019t require user-maintained partition columns, it can hide partitioning. Partition values are produced correctly every time and always used to speed up queries, when possible. Producers and consumers wouldn\u2019t even see event_date.\nMost importantly, queries no longer depend on a table\u2019s physical layout. With a separation between physical and logical, Iceberg tables can evolve partition schemes over time as data volume changes. Misconfigured tables can be fixed without an expensive migration.\nFor details about all the supported hidden partition transformations, see the Partition Transforms section.\nFor details about updating a table\u2019s partition spec, see the partition evolution section.\n", "description": "", "title": "Partitioning", "uri": "/docs/latest/partitioning/"}, {"categories": null, "content": " Performance Iceberg is designed for huge tables and is used in production where a single table can contain tens of petabytes of data. Even multi-petabyte tables can be read from a single node, without needing a distributed SQL engine to sift through table metadata. Scan planning Scan planning is the process of finding the files in a table that are needed for a query.\nPlanning in an Iceberg table fits on a single node because Iceberg\u2019s metadata can be used to prune metadata files that aren\u2019t needed, in addition to filtering data files that don\u2019t contain matching data.\nFast scan planning from a single node enables:\nLower latency SQL queries \u2013 by eliminating a distributed scan to plan a distributed scan Access from any client \u2013 stand-alone processes can read data directly from Iceberg tables Metadata filtering Iceberg uses two levels of metadata to track the files in a snapshot.\nManifest files store a list of data files, along each data file\u2019s partition data and column-level stats A manifest list stores the snapshot\u2019s list of manifests, along with the range of values for each partition field For fast scan planning, Iceberg first filters manifests using the partition value ranges in the manifest list. Then, it reads each manifest to get data files. With this scheme, the manifest list acts as an index over the manifest files, making it possible to plan without reading all manifests.\nIn addition to partition value ranges, a manifest list also stores the number of files added or deleted in a manifest to speed up operations like snapshot expiration.\nData filtering Manifest files include a tuple of partition data and column-level stats for each data file.\nDuring planning, query predicates are automatically converted to predicates on the partition data and applied first to filter data files. Next, column-level value counts, null counts, lower bounds, and upper bounds are used to eliminate files that cannot match the query predicate.\nBy using upper and lower bounds to filter data files at planning time, Iceberg uses clustered data to eliminate splits without running tasks. In some cases, this is a 10x performance improvement.\n", "description": "", "title": "Performance", "uri": "/docs/latest/performance/"}, {"categories": null, "content": " Spark Procedures To use Iceberg in Spark, first configure Spark catalogs. Stored procedures are only available when using Iceberg SQL extensions in Spark 3.x.\nUsage Procedures can be used from any configured Iceberg catalog with CALL. All procedures are in the namespace system.\nCALL supports passing arguments by name (recommended) or by position. Mixing position and named arguments is not supported.\nNamed arguments All procedure arguments are named. When passing arguments by name, arguments can be in any order and any optional argument can be omitted.\nCALL catalog_name.system.procedure_name(arg_name_2 => arg_2, arg_name_1 => arg_1) Positional arguments When passing arguments by position, only the ending arguments may be omitted if they are optional.\nCALL catalog_name.system.procedure_name(arg_1, arg_2, ... arg_n) Snapshot management rollback_to_snapshot Roll back a table to a specific snapshot ID.\nTo roll back to a specific time, use rollback_to_timestamp.\nThis procedure invalidates all cached Spark plans that reference the affected table. Usage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to update snapshot_id \u2714\ufe0f long Snapshot ID to rollback to Output Output Name Type Description previous_snapshot_id long The current snapshot ID before the rollback current_snapshot_id long The new current snapshot ID Example Roll back table db.sample to snapshot ID 1:\nCALL catalog_name.system.rollback_to_snapshot('db.sample', 1) rollback_to_timestamp Roll back a table to the snapshot that was current at some time.\nThis procedure invalidates all cached Spark plans that reference the affected table. Usage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to update timestamp \u2714\ufe0f timestamp A timestamp to rollback to Output Output Name Type Description previous_snapshot_id long The current snapshot ID before the rollback current_snapshot_id long The new current snapshot ID Example Roll back db.sample to a specific day and time.\nCALL catalog_name.system.rollback_to_timestamp('db.sample', TIMESTAMP '2021-06-30 00:00:00.000') set_current_snapshot Sets the current snapshot ID for a table.\nUnlike rollback, the snapshot is not required to be an ancestor of the current table state.\nThis procedure invalidates all cached Spark plans that reference the affected table. Usage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to update snapshot_id \u2714\ufe0f long Snapshot ID to set as current Output Output Name Type Description previous_snapshot_id long The current snapshot ID before the rollback current_snapshot_id long The new current snapshot ID Example Set the current snapshot for db.sample to 1:\nCALL catalog_name.system.set_current_snapshot('db.sample', 1) cherrypick_snapshot Cherry-picks changes from a snapshot into the current table state.\nCherry-picking creates a new snapshot from an existing snapshot without altering or removing the original.\nOnly append and dynamic overwrite snapshots can be cherry-picked.\nThis procedure invalidates all cached Spark plans that reference the affected table. Usage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to update snapshot_id \u2714\ufe0f long The snapshot ID to cherry-pick Output Output Name Type Description source_snapshot_id long The table\u2019s current snapshot before the cherry-pick current_snapshot_id long The snapshot ID created by applying the cherry-pick Examples Cherry-pick snapshot 1\nCALL catalog_name.system.cherrypick_snapshot('my_table', 1) Cherry-pick snapshot 1 with named args\nCALL catalog_name.system.cherrypick_snapshot(snapshot_id => 1, table => 'my_table' ) Metadata management Many maintenance actions can be performed using Iceberg stored procedures.\nexpire_snapshots Each write/update/delete/upsert/compaction in Iceberg produces a new snapshot while keeping the old data and metadata around for snapshot isolation and time travel. The expire_snapshots procedure can be used to remove older snapshots and their files which are no longer needed.\nThis procedure will remove old snapshots and data files which are uniquely required by those old snapshots. This means the expire_snapshots procedure will never remove files which are still required by a non-expired snapshot.\nUsage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to update older_than \ufe0f timestamp Timestamp before which snapshots will be removed (Default: 5 days ago) retain_last int Number of ancestor snapshots to preserve regardless of older_than (defaults to 1) max_concurrent_deletes int Size of the thread pool used for delete file actions (by default, no thread pool is used) stream_results boolean When true, deletion files will be sent to Spark driver by RDD partition (by default, all the files will be sent to Spark driver). This option is recommended to set to true to prevent Spark driver OOM from large file size If older_than and retain_last are omitted, the table\u2019s expiration properties will be used.\nOutput Output Name Type Description deleted_data_files_count long Number of data files deleted by this operation deleted_manifest_files_count long Number of manifest files deleted by this operation deleted_manifest_lists_count long Number of manifest List files deleted by this operation Examples Remove snapshots older than specific day and time, but retain the last 100 snapshots:\nCALL hive_prod.system.expire_snapshots('db.sample', TIMESTAMP '2021-06-30 00:00:00.000', 100) remove_orphan_files Used to remove files which are not referenced in any metadata files of an Iceberg table and can thus be considered \u201corphaned\u201d.\nUsage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to clean older_than \ufe0f timestamp Remove orphan files created before this timestamp (Defaults to 3 days ago) location string Directory to look for files in (defaults to the table\u2019s location) dry_run boolean When true, don\u2019t actually remove files (defaults to false) max_concurrent_deletes int Size of the thread pool used for delete file actions (by default, no thread pool is used) Output Output Name Type Description orphan_file_location String The path to each file determined to be an orphan by this command Examples List all the files that are candidates for removal by performing a dry run of the remove_orphan_files command on this table without actually removing them:\nCALL catalog_name.system.remove_orphan_files(table => 'db.sample', dry_run => true) Remove any files in the tablelocation/data folder which are not known to the table db.sample.\nCALL catalog_name.system.remove_orphan_files(table => 'db.sample', location => 'tablelocation/data') rewrite_data_files Iceberg tracks each data file in a table. More data files leads to more metadata stored in manifest files, and small data files causes an unnecessary amount of metadata and less efficient queries from file open costs.\nIceberg can compact data files in parallel using Spark with the rewriteDataFiles action. This will combine small files into larger files to reduce metadata overhead and runtime file open cost.\nUsage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to update strategy string Name of the strategy - binpack or sort. Defaults to binpack strategy sort_order string If Zorder, then comma separated column names within zorder() text. Example: zorder(c1,c2,c3). Else, Comma separated sort_order_column. Where sort_order_column is a space separated sort order info per column (ColumnName SortDirection NullOrder). SortDirection can be ASC or DESC. NullOrder can be NULLS FIRST or NULLS LAST options \ufe0f map<string, string> Options to be used for actions where \ufe0f string predicate as a string used for filtering the files. Note that all files that may contain data matching the filter will be selected for rewriting See the RewriteDataFiles Javadoc, BinPackStrategy Javadoc and SortStrategy Javadoc for list of all the supported options for this action.\nOutput Output Name Type Description rewritten_data_files_count int Number of data which were re-written by this command added_data_files_count int Number of new data files which were written by this command Examples Rewrite the data files in table db.sample using the default rewrite algorithm of bin-packing to combine small files and also split large files according to the default write size of the table.\nCALL catalog_name.system.rewrite_data_files('db.sample') Rewrite the data files in table db.sample by sorting all the data on id and name using the same defaults as bin-pack to determine which files to rewrite.\nCALL catalog_name.system.rewrite_data_files(table => 'db.sample', strategy => 'sort', sort_order => 'id DESC NULLS LAST,name ASC NULLS FIRST') Rewrite the data files in table db.sample by zOrdering on column c1 and c2. Using the same defaults as bin-pack to determine which files to rewrite.\nCALL catalog_name.system.rewrite_data_files(table => 'db.sample', strategy => 'sort', sort_order => 'zorder(c1,c2)') Rewrite the data files in table db.sample using bin-pack strategy in any partition where more than 2 or more files need to be rewritten.\nCALL catalog_name.system.rewrite_data_files(table => 'db.sample', options => map('min-input-files','2')) Rewrite the data files in table db.sample and select the files that may contain data matching the filter (id = 3 and name = \u201cfoo\u201d) to be rewritten.\nCALL catalog_name.system.rewrite_data_files(table => 'db.sample', where => 'id = 3 and name = \"foo\"') rewrite_manifests Rewrite manifests for a table to optimize scan planning.\nData files in manifests are sorted by fields in the partition spec. This procedure runs in parallel using a Spark job.\nSee the RewriteManifests Javadoc to see more configuration options.\nThis procedure invalidates all cached Spark plans that reference the affected table. Usage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to update use_caching \ufe0f boolean Use Spark caching during operation (defaults to true) Output Output Name Type Description rewritten_manifests_count int Number of manifests which were re-written by this command added_mainfests_count int Number of new manifest files which were written by this command Examples Rewrite the manifests in table db.sample and align manifest files with table partitioning.\nCALL catalog_name.system.rewrite_manifests('db.sample') Rewrite the manifests in table db.sample and disable the use of Spark caching. This could be done to avoid memory issues on executors.\nCALL catalog_name.system.rewrite_manifests('db.sample', false) Table migration The snapshot and migrate procedures help test and migrate existing Hive or Spark tables to Iceberg.\nsnapshot Create a light-weight temporary copy of a table for testing, without changing the source table.\nThe newly created table can be changed or written to without affecting the source table, but the snapshot uses the original table\u2019s data files.\nWhen inserts or overwrites run on the snapshot, new files are placed in the snapshot table\u2019s location rather than the original table location.\nWhen finished testing a snapshot table, clean it up by running DROP TABLE.\nBecause tables created by snapshot are not the sole owners of their data files, they are prohibited from actions like expire_snapshots which would physically delete data files. Iceberg deletes, which only effect metadata, are still allowed. In addition, any operations which affect the original data files will disrupt the Snapshot\u2019s integrity. DELETE statements executed against the original Hive table will remove original data files and the snapshot table will no longer be able to access them. See migrate to replace an existing table with an Iceberg table.\nUsage Argument Name Required? Type Description source_table \u2714\ufe0f string Name of the table to snapshot table \u2714\ufe0f string Name of the new Iceberg table to create location string Table location for the new table (delegated to the catalog by default) properties \ufe0f map<string, string> Properties to add to the newly created table Output Output Name Type Description imported_files_count long Number of files added to the new table Examples Make an isolated Iceberg table which references table db.sample named db.snap at the catalog\u2019s default location for db.snap.\nCALL catalog_name.system.snapshot('db.sample', 'db.snap') Migrate an isolated Iceberg table which references table db.sample named db.snap at a manually specified location /tmp/temptable/.\nCALL catalog_name.system.snapshot('db.sample', 'db.snap', '/tmp/temptable/') migrate Replace a table with an Iceberg table, loaded with the source\u2019s data files.\nTable schema, partitioning, properties, and location will be copied from the source table.\nMigrate will fail if any table partition uses an unsupported format. Supported formats are Avro, Parquet, and ORC. Existing data files are added to the Iceberg table\u2019s metadata and can be read using a name-to-id mapping created from the original table schema.\nTo leave the original table intact while testing, use snapshot to create new temporary table that shares source data files and schema.\nUsage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to migrate properties \ufe0f map<string, string> Properties for the new Iceberg table Output Output Name Type Description migrated_files_count long Number of files appended to the Iceberg table Examples Migrate the table db.sample in Spark\u2019s default catalog to an Iceberg table and add a property \u2018foo\u2019 set to \u2018bar\u2019:\nCALL catalog_name.system.migrate('spark_catalog.db.sample', map('foo', 'bar')) Migrate db.sample in the current catalog to an Iceberg table without adding any additional properties:\nCALL catalog_name.system.migrate('db.sample') add_files Attempts to directly add files from a Hive or file based table into a given Iceberg table. Unlike migrate or snapshot, add_files can import files from a specific partition or partitions and does not create a new Iceberg table. This command will create metadata for the new files and will not move them. This procedure will not analyze the schema of the files to determine if they actually match the schema of the Iceberg table. Upon completion, the Iceberg table will then treat these files as if they are part of the set of files owned by Iceberg. This means any subsequent expire_snapshot calls will be able to physically delete the added files. This method should not be used if migrate or snapshot are possible.\nUsage Argument Name Required? Type Description table \u2714\ufe0f string Table which will have files added to source_table \u2714\ufe0f string Table where files should come from, paths are also possible in the form of `file_format`.`path` partition_filter \ufe0f map<string, string> A map of partitions in the source table to import from Warning : Schema is not validated, adding files with different schema to the Iceberg table will cause issues.\nWarning : Files added by this method can be physically deleted by Iceberg operations\nExamples Add the files from table db.src_table, a Hive or Spark table registered in the session Catalog, to Iceberg table db.tbl. Only add files that exist within partitions where part_col_1 is equal to A.\nCALL spark_catalog.system.add_files( table => 'db.tbl', source_table => 'db.src_tbl', partition_filter => map('part_col_1', 'A') ) Add files from a parquet file based table at location path/to/table to the Iceberg table db.tbl. Add all files regardless of what partition they belong to.\nCALL spark_catalog.system.add_files( table => 'db.tbl', source_table => '`parquet`.`path/to/table`' ) Metadata information ancestors_of Report the live snapshot IDs of parents of a specified snapshot\nUsage Argument Name Required? Type Description table \u2714\ufe0f string Name of the table to report live snapshot IDs snapshot_id \ufe0f long Use a specified snapshot to get the live snapshot IDs of parents tip : Using snapshot_id\nGiven snapshots history with roll back to B and addition of C\u2019 -> D\u2019\nA -> B - > C -> D \\ -> C' -> (D') Not specifying the snapshot ID would return A -> B -> C\u2019 -> D\u2019, while providing the snapshot ID of D as an argument would return A-> B -> C -> D\nOutput Output Name Type Description snapshot_id long the ancestor snapshot id timestamp long snapshot creation time Examples Get all the snapshot ancestors of current snapshots(default)\nCALL spark_catalog.system.ancestors_of('db.tbl') Get all the snapshot ancestors by a particular snapshot\nCALL spark_catalog.system.ancestors_of('db.tbl', 1) CALL spark_catalog.system.ancestors_of(snapshot_id => 1, table => 'db.tbl') ", "description": "", "title": "Procedures", "uri": "/docs/latest/spark-procedures/"}, {"categories": null, "content": " Iceberg Python API Much of the python api conforms to the java api. You can get more info about the java api here.\nCatalog The Catalog interface, like java provides search and management operations for tables.\nTo create a catalog:\nfrom iceberg.hive import HiveTables # instantiate Hive Tables conf = {\"hive.metastore.uris\": 'thrift://{hms_host}:{hms_port}', \"hive.metastore.warehouse.dir\": {tmpdir} } tables = HiveTables(conf) and to create a table from a catalog:\nfrom iceberg.api.schema import Schema\\ from iceberg.api.types import TimestampType, DoubleType, StringType, NestedField from iceberg.api.partition_spec import PartitionSpecBuilder schema = Schema(NestedField.optional(1, \"DateTime\", TimestampType.with_timezone()), NestedField.optional(2, \"Bid\", DoubleType.get()), NestedField.optional(3, \"Ask\", DoubleType.get()), NestedField.optional(4, \"symbol\", StringType.get())) partition_spec = PartitionSpecBuilder(schema).add(1, 1000, \"DateTime_day\", \"day\").build() tables.create(schema, \"test.test_123\", partition_spec) Tables The Table interface provides access to table metadata\nschema returns the current table Schema spec returns the current table PartitonSpec properties returns a map of key-value TableProperties currentSnapshot returns the current table Snapshot snapshots returns all valid snapshots for the table snapshot(id) returns a specific snapshot by ID location returns the table\u2019s base location Tables also provide refresh to update the table to the latest version.\nScanning Iceberg table scans start by creating a TableScan object with newScan.\nscan = table.new_scan(); To configure a scan, call filter and select on the TableScan to get a new TableScan with those changes.\nfiltered_scan = scan.filter(Expressions.equal(\"id\", 5)) String expressions can also be passed to the filter method.\nfiltered_scan = scan.filter(\"id=5\") Schema projections can be applied against a TableScan by passing a list of column names.\nfiltered_scan = scan.select([\"col_1\", \"col_2\", \"col_3\"]) Because some data types cannot be read using the python library, a convenience method for excluding columns from projection is provided.\nfiltered_scan = scan.select_except([\"unsupported_col_1\", \"unsupported_col_2\"]) Calls to configuration methods create a new TableScan so that each TableScan is immutable.\nWhen a scan is configured, planFiles, planTasks, and Schema are used to return files, tasks, and the read projection.\nscan = table.new_scan() \\ .filter(\"id=5\") \\ .select([\"id\", \"data\"]) projection = scan.schema for task in scan.plan_tasks(): print(task) Types Iceberg data types are located in iceberg.api.types.types\nPrimitives Primitive type instances are available from static methods in each type class. Types without parameters use get, and types like DecimalType use factory methods:\nIntegerType.get() # int DoubleType.get() # double DecimalType.of(9, 2) # decimal(9, 2) Nested types Structs, maps, and lists are created using factory methods in type classes.\nLike struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track field IDs and nullability.\nStruct fields are created using NestedField.optional or NestedField.required. Map value and list element nullability is set in the map and list factory methods.\n# struct<1 id: int, 2 data: optional string> struct = StructType.of([NestedField.required(1, \"id\", IntegerType.get()), NestedField.optional(2, \"data\", StringType.get()]) ) # map<1 key: int, 2 value: optional string> map_var = MapType.of_optional(1, IntegerType.get(), 2, StringType.get()) # array<1 element: int> list_var = ListType.of_required(1, IntegerType.get()); Expressions Iceberg\u2019s Expressions are used to configure table scans. To create Expressions, use the factory methods in Expressions.\nSupported Predicate expressions are:\nis_null not_null equal not_equal less_than less_than_or_equal greater_than greater_than_or_equal Supported expression Operationsare:\nand or not Constant expressions are:\nalways_true always_false ", "description": "", "title": "Python API", "uri": "/docs/latest/python-api-intro/"}, {"categories": null, "content": " Feature Support The goal is that the python library will provide a functional, performant subset of the java library. The initial focus has been on reading table metadata as well as providing the capability to both plan and execute a scan.\nFeature Comparison Metadata Operation Java Python Get Schema X X Get Snapshots X X Plan Scan X X Plan Scan for Snapshot X X Update Current Snapshot X Set Table Properties X Create Table X X Drop Table X X Alter Table X Read Support Pyarrow is used for reading parquet files, so read support is limited to what is currently supported in the pyarrow.parquet package.\nPrimitive Types Data Type Java Python BooleanType X X DateType X X DecimalType X X FloatType X X IntegerType X X LongType X X TimeType X X TimestampType X X Nested Types Data Type Java Python ListType of primitives X X MapType of primitives X X StructType of primitives X X ListType of Nested Types X MapType of Nested Types X Write Support The python client does not currently support write capability\n", "description": "", "title": "Python Feature Support", "uri": "/docs/latest/python-feature-support/"}, {"categories": null, "content": " Python API Quickstart Installation Iceberg python is currently in development, for development and testing purposes the best way to install the library is to perform the following steps:\ngit clone https://github.com/apache/iceberg.git cd iceberg/python pip install -e . Testing Testing is done using tox. The config can be found in tox.ini within the python directory of the iceberg project.\n# simply run tox from within the python dir tox Examples Inspect Table Metadata from iceberg.hive import HiveTables # instantiate Hive Tables conf = {\"hive.metastore.uris\": 'thrift://{hms_host}:{hms_port}'} tables = HiveTables(conf) # load table tbl = tables.load(\"iceberg_db.iceberg_test_table\") # inspect metadata print(tbl.schema()) print(tbl.spec()) print(tbl.location()) # get table level record count from pprint import pprint pprint(int(tbl.current_snapshot().summary.get(\"total-records\"))) ", "description": "", "title": "Python Quickstart", "uri": "/docs/latest/python-quickstart/"}, {"categories": null, "content": " Spark Queries To use Iceberg in Spark, first configure Spark catalogs.\nIceberg uses Apache Spark\u2019s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions:\nFeature support Spark 3.0 Spark 2.4 Notes SELECT \u2714\ufe0f DataFrame reads \u2714\ufe0f \u2714\ufe0f Metadata table SELECT \u2714\ufe0f History metadata table \u2714\ufe0f \u2714\ufe0f Snapshots metadata table \u2714\ufe0f \u2714\ufe0f Files metadata table \u2714\ufe0f \u2714\ufe0f Manifests metadata table \u2714\ufe0f \u2714\ufe0f Partitions metadata table \u2714\ufe0f \u2714\ufe0f All metadata tables \u2714\ufe0f \u2714\ufe0f Querying with SQL In Spark 3, tables use identifiers that include a catalog name.\nSELECT * FROM prod.db.table -- catalog: prod, namespace: db, table: table Metadata tables, like history and snapshots, can use the Iceberg table name as a namespace.\nFor example, to read from the files metadata table for prod.db.table:\nSELECT * FROM prod.db.table.files content file_path file_format spec_id partition record_count file_size_in_bytes column_sizes value_counts null_value_counts nan_value_counts lower_bounds upper_bounds key_metadata split_offsets equality_ids sort_order_id 0 s3:/\u2026/table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet PARQUET 0 {1999-01-01, 01} 1 597 [1 -> 90, 2 -> 62] [1 -> 1, 2 -> 1] [1 -> 0, 2 -> 0] [] [1 -> , 2 -> c] [1 -> , 2 -> c] null [4] null null 0 s3:/\u2026/table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet PARQUET 0 {1999-01-01, 02} 1 597 [1 -> 90, 2 -> 62] [1 -> 1, 2 -> 1] [1 -> 0, 2 -> 0] [] [1 -> , 2 -> b] [1 -> , 2 -> b] null [4] null null 0 s3:/\u2026/table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet PARQUET 0 {1999-01-01, 03} 1 597 [1 -> 90, 2 -> 62] [1 -> 1, 2 -> 1] [1 -> 0, 2 -> 0] [] [1 -> , 2 -> a] [1 -> , 2 -> a] null [4] null null Querying with DataFrames To load a table as a DataFrame, use table:\nval df = spark.table(\"prod.db.table\") Catalogs with DataFrameReader Iceberg 0.11.0 adds multi-catalog support to DataFrameReader in both Spark 3.x and 2.4.\nPaths and table names can be loaded with Spark\u2019s DataFrameReader interface. How tables are loaded depends on how the identifier is specified. When using spark.read.format(\"iceberg\").path(table) or spark.table(table) the table variable can take a number of forms as listed below:\nfile:/path/to/table: loads a HadoopTable at given path tablename: loads currentCatalog.currentNamespace.tablename catalog.tablename: loads tablename from the specified catalog. namespace.tablename: loads namespace.tablename from current catalog catalog.namespace.tablename: loads namespace.tablename from the specified catalog. namespace1.namespace2.tablename: loads namespace1.namespace2.tablename from current catalog The above list is in order of priority. For example: a matching catalog will take priority over any namespace resolution.\nTime travel SQL Spark 3.3 and later supports time travel in SQL queries using TIMESTAMP AS OF or VERSION AS OF clauses\n-- time travel to October 26, 1986 at 01:21:00 SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00'; -- time travel to snapshot with id 10963874102873L SELECT * FROM prod.db.table VERSION AS OF 10963874102873; In addition, FOR SYSTEM_TIME AS OF and FOR SYSTEM_VERSION AS OF clauses are also supported:\nSELECT * FROM prod.db.table FOR SYSTEM_TIME AS OF '1986-10-26 01:21:00'; SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 10963874102873; Timestamps may also be supplied as a Unix timestamp, in seconds:\n-- timestamp in seconds SELECT * FROM prod.db.table TIMESTAMP AS OF 499162860; SELECT * FROM prod.db.table FOR SYSTEM_TIME AS OF 499162860; DataFrame To select a specific table snapshot or the snapshot at some time in the DataFrame API, Iceberg supports two Spark read options:\nsnapshot-id selects a specific table snapshot as-of-timestamp selects the current snapshot at a timestamp, in milliseconds // time travel to October 26, 1986 at 01:21:00 spark.read .option(\"as-of-timestamp\", \"499162860000\") .format(\"iceberg\") .load(\"path/to/table\") // time travel to snapshot with ID 10963874102873L spark.read .option(\"snapshot-id\", 10963874102873L) .format(\"iceberg\") .load(\"path/to/table\") Spark 3.0 and earlier versions do not support using option with table in DataFrameReader commands. All options will be silently ignored. Do not use table when attempting to time-travel or use other options. See SPARK-32592. Incremental read To read appended data incrementally, use:\nstart-snapshot-id Start snapshot ID used in incremental scans (exclusive). end-snapshot-id End snapshot ID used in incremental scans (inclusive). This is optional. Omitting it will default to the current snapshot. // get the data added after start-snapshot-id (10963874102873L) until end-snapshot-id (63874143573109L) spark.read() .format(\"iceberg\") .option(\"start-snapshot-id\", \"10963874102873\") .option(\"end-snapshot-id\", \"63874143573109\") .load(\"path/to/table\") Currently gets only the data from append operation. Cannot support replace, overwrite, delete operations. Incremental read works with both V1 and V2 format-version. Incremental read is not supported by Spark\u2019s SQL syntax. Spark 2.4 Spark 2.4 requires using the DataFrame reader with iceberg as a format, because 2.4 does not support direct SQL queries:\n// named metastore table spark.read.format(\"iceberg\").load(\"catalog.db.table\") // Hadoop path table spark.read.format(\"iceberg\").load(\"hdfs://nn:8020/path/to/table\") Spark 2.4 with SQL To run SQL SELECT statements on Iceberg tables in 2.4, register the DataFrame as a temporary table:\nval df = spark.read.format(\"iceberg\").load(\"db.table\") df.createOrReplaceTempView(\"table\") spark.sql(\"\"\"select count(1) from table\"\"\").show() Inspecting tables To inspect a table\u2019s history, snapshots, and other metadata, Iceberg supports metadata tables.\nMetadata tables are identified by adding the metadata table name after the original table name. For example, history for db.table is read using db.table.history.\nFor Spark 2.4, use the DataFrameReader API to inspect tables.\nFor Spark 3, prior to 3.2, the Spark session catalog does not support table names with multipart identifiers such as catalog.database.table.metadata. As a workaround, configure an org.apache.iceberg.spark.SparkCatalog, or use the Spark DataFrameReader API.\nHistory To show table history:\nSELECT * FROM prod.db.table.history made_current_at snapshot_id parent_id is_current_ancestor 2019-02-08 03:29:51.215 5781947118336215154 NULL true 2019-02-08 03:47:55.948 5179299526185056830 5781947118336215154 true 2019-02-09 16:24:30.13 296410040247533544 5179299526185056830 false 2019-02-09 16:32:47.336 2999875608062437330 5179299526185056830 true 2019-02-09 19:42:03.919 8924558786060583479 2999875608062437330 true 2019-02-09 19:49:16.343 6536733823181975045 8924558786060583479 true This shows a commit that was rolled back. The example has two snapshots with the same parent, and one is not an ancestor of the current table state. Snapshots To show the valid snapshots for a table:\nSELECT * FROM prod.db.table.snapshots committed_at snapshot_id parent_id operation manifest_list summary 2019-02-08 03:29:51.215 57897183625154 null append s3://\u2026/table/metadata/snap-57897183625154-1.avro { added-records -> 2478404, total-records -> 2478404, added-data-files -> 438, total-data-files -> 438, spark.app.id -> application_1520379288616_155055 } You can also join snapshots to table history. For example, this query will show table history, with the application ID that wrote each snapshot:\nselect h.made_current_at, s.operation, h.snapshot_id, h.is_current_ancestor, s.summary['spark.app.id'] from prod.db.table.history h join prod.db.table.snapshots s on h.snapshot_id = s.snapshot_id order by made_current_at made_current_at operation snapshot_id is_current_ancestor summary[spark.app.id] 2019-02-08 03:29:51.215 append 57897183625154 true application_1520379288616_155055 2019-02-09 16:24:30.13 delete 29641004024753 false application_1520379288616_151109 2019-02-09 16:32:47.336 append 57897183625154 true application_1520379288616_155055 2019-02-08 03:47:55.948 overwrite 51792995261850 true application_1520379288616_152431 Files To show a table\u2019s current data files:\nSELECT * FROM prod.db.table.files content file_path file_format spec_id partition record_count file_size_in_bytes column_sizes value_counts null_value_counts nan_value_counts lower_bounds upper_bounds key_metadata split_offsets equality_ids sort_order_id 0 s3:/\u2026/table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet PARQUET 0 {1999-01-01, 01} 1 597 [1 -> 90, 2 -> 62] [1 -> 1, 2 -> 1] [1 -> 0, 2 -> 0] [] [1 -> , 2 -> c] [1 -> , 2 -> c] null [4] null null 0 s3:/\u2026/table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet PARQUET 0 {1999-01-01, 02} 1 597 [1 -> 90, 2 -> 62] [1 -> 1, 2 -> 1] [1 -> 0, 2 -> 0] [] [1 -> , 2 -> b] [1 -> , 2 -> b] null [4] null null 0 s3:/\u2026/table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet PARQUET 0 {1999-01-01, 03} 1 597 [1 -> 90, 2 -> 62] [1 -> 1, 2 -> 1] [1 -> 0, 2 -> 0] [] [1 -> , 2 -> a] [1 -> , 2 -> a] null [4] null null Manifests To show a table\u2019s current file manifests:\nSELECT * FROM prod.db.table.manifests path length partition_spec_id added_snapshot_id added_data_files_count existing_data_files_count deleted_data_files_count partition_summaries s3://\u2026/table/metadata/45b5290b-ee61-4788-b324-b1e2735c0e10-m0.avro 4479 0 6668963634911763636 8 0 0 [[false,null,2019-05-13,2019-05-15]] Note:\nFields within partition_summaries column of the manifests table correspond to field_summary structs within manifest list, with the following order: contains_null contains_nan lower_bound upper_bound contains_nan could return null, which indicates that this information is not available from the file\u2019s metadata. This usually occurs when reading from V1 table, where contains_nan is not populated. Partitions To show a table\u2019s current partitions:\nSELECT * FROM prod.db.table.partitions partition record_count file_count {20211001, 11} 1 1 {20211002, 11} 1 1 {20211001, 10} 1 1 {20211002, 10} 1 1 All Metadata Tables These tables are unions of the metadata tables specific to the current snapshot, and return metadata across all snapshots.\nThe \u201call\u201d metadata tables may produce more than one row per data file or manifest file because metadata files may be part of more than one table snapshot. All Data Files To show all of the table\u2019s data files and each file\u2019s metadata:\nSELECT * FROM prod.db.table.all_data_files content file_path file_format partition record_count file_size_in_bytes column_sizes value_counts null_value_counts nan_value_counts lower_bounds upper_bounds key_metadata split_offsets equality_ids sort_order_id 0 s3://\u2026/dt=20210102/00000-0-756e2512-49ae-45bb-aae3-c0ca475e7879-00001.parquet PARQUET {20210102} 14 2444 {1 -> 94, 2 -> 17} {1 -> 14, 2 -> 14} {1 -> 0, 2 -> 0} {} {1 -> 1, 2 -> 20210102} {1 -> 2, 2 -> 20210102} null [4] null 0 0 s3://\u2026/dt=20210103/00000-0-26222098-032f-472b-8ea5-651a55b21210-00001.parquet PARQUET {20210103} 14 2444 {1 -> 94, 2 -> 17} {1 -> 14, 2 -> 14} {1 -> 0, 2 -> 0} {} {1 -> 1, 2 -> 20210103} {1 -> 3, 2 -> 20210103} null [4] null 0 0 s3://\u2026/dt=20210104/00000-0-a3bb1927-88eb-4f1c-bc6e-19076b0d952e-00001.parquet PARQUET {20210104} 14 2444 {1 -> 94, 2 -> 17} {1 -> 14, 2 -> 14} {1 -> 0, 2 -> 0} {} {1 -> 1, 2 -> 20210104} {1 -> 3, 2 -> 20210104} null [4] null 0 All Manifests To show all of the table\u2019s manifest files:\nSELECT * FROM prod.db.table.all_manifests path length partition_spec_id added_snapshot_id added_data_files_count existing_data_files_count deleted_data_files_count partition_summaries s3://\u2026/metadata/a85f78c5-3222-4b37-b7e4-faf944425d48-m0.avro 6376 0 6272782676904868561 2 0 0 [{false, false, 20210101, 20210101}] Note:\nFields within partition_summaries column of the manifests table correspond to field_summary structs within manifest list, with the following order: contains_null contains_nan lower_bound upper_bound contains_nan could return null, which indicates that this information is not available from the file\u2019s metadata. This usually occurs when reading from V1 table, where contains_nan is not populated. Inspecting with DataFrames Metadata tables can be loaded in Spark 2.4 or Spark 3 using the DataFrameReader API:\n// named metastore table spark.read.format(\"iceberg\").load(\"db.table.files\").show(truncate = false) // Hadoop path table spark.read.format(\"iceberg\").load(\"hdfs://nn:8020/path/to/table#files\").show(truncate = false) ", "description": "", "title": "Queries", "uri": "/docs/latest/spark-queries/"}, {"categories": null, "content": " Reliability Iceberg was designed to solve correctness problems that affect Hive tables running in S3.\nHive tables track data files using both a central metastore for partitions and a file system for individual files. This makes atomic changes to a table\u2019s contents impossible, and eventually consistent stores like S3 may return incorrect results due to the use of listing files to reconstruct the state of a table. It also requires job planning to make many slow listing calls: O(n) with the number of partitions.\nIceberg tracks the complete list of data files in each snapshot using a persistent tree structure. Every write or delete produces a new snapshot that reuses as much of the previous snapshot\u2019s metadata tree as possible to avoid high write volumes.\nValid snapshots in an Iceberg table are stored in the table metadata file, along with a reference to the current snapshot. Commits replace the path of the current table metadata file using an atomic operation. This ensures that all updates to table data and metadata are atomic, and is the basis for serializable isolation.\nThis results in improved reliability guarantees:\nSerializable isolation: All table changes occur in a linear history of atomic table updates Reliable reads: Readers always use a consistent snapshot of the table without holding a lock Version history and rollback: Table snapshots are kept as history and tables can roll back if a job produces bad data Safe file-level operations. By supporting atomic changes, Iceberg enables new use cases, like safely compacting small files and safely appending late data to tables This design also has performance benefits:\nO(1) RPCs to plan: Instead of listing O(n) directories in a table to plan a job, reading a snapshot requires O(1) RPC calls Distributed planning: File pruning and predicate push-down is distributed to jobs, removing the metastore as a bottleneck Finer granularity partitioning: Distributed planning and O(1) RPC calls remove the current barriers to finer-grained partitioning Concurrent write operations Iceberg supports multiple concurrent writes using optimistic concurrency.\nEach writer assumes that no other writers are operating and writes out new table metadata for an operation. Then, the writer attempts to commit by atomically swapping the new table metadata file for the existing metadata file.\nIf the atomic swap fails because another writer has committed, the failed writer retries by writing a new metadata tree based on the the new current table state.\nCost of retries Writers avoid expensive retry operations by structuring changes so that work can be reused across retries.\nFor example, appends usually create a new manifest file for the appended data files, which can be added to the table without rewriting the manifest on every attempt.\nRetry validation Commits are structured as assumptions and actions. After a conflict, a writer checks that the assumptions are met by the current table state. If the assumptions are met, then it is safe to re-apply the actions and commit.\nFor example, a compaction might rewrite file_a.avro and file_b.avro as merged.parquet. This is safe to commit as long as the table still contains both file_a.avro and file_b.avro. If either file was deleted by a conflicting commit, then the operation must fail. Otherwise, it is safe to remove the source files and add the merged file.\nCompatibility By avoiding file listing and rename operations, Iceberg tables are compatible with any object store. No consistent listing is required.\n", "description": "", "title": "Reliability", "uri": "/docs/latest/reliability/"}, {"categories": null, "content": " Schemas Iceberg tables support the following types:\nType Description Notes boolean True or false int 32-bit signed integers Can promote to long long 64-bit signed integers float 32-bit IEEE 754 floating point Can promote to double double 64-bit IEEE 754 floating point decimal(P,S) Fixed-point decimal; precision P, scale S Scale is fixed and precision must be 38 or less date Calendar date without timezone or time time Time of day without date, timezone Stored as microseconds timestamp Timestamp without timezone Stored as microseconds timestamptz Timestamp with timezone Stored as microseconds string Arbitrary-length character sequences Encoded with UTF-8 fixed(L) Fixed-length byte array of length L binary Arbitrary-length byte array struct<...> A record with named fields of any data type list<E> A list with elements of any data type map<K, V> A map with keys and values of any data type Iceberg tracks each field in a table schema using an ID that is never reused in a table. See correctness guarantees for more information.\n", "description": "", "title": "Schemas", "uri": "/docs/latest/schemas/"}, {"categories": null, "content": " Spark Structured Streaming Iceberg uses Apache Spark\u2019s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions.\nAs of Spark 3.0, DataFrame reads and writes are supported.\nFeature support Spark 3.0 Spark 2.4 Notes DataFrame write \u2714 \u2714 Streaming Reads Iceberg supports processing incremental data in spark structured streaming jobs which starts from a historical timestamp:\nval df = spark.readStream .format(\"iceberg\") .option(\"stream-from-timestamp\", Long.toString(streamStartTimestamp)) .load(\"database.table_name\") Iceberg only supports reading data from append snapshots. Overwrite snapshots cannot be processed and will cause an exception by default. Overwrites may be ignored by setting streaming-skip-overwrite-snapshots=true. Similarly, delete snapshots will cause an exception by default, and deletes may be ignored by setting streaming-skip-delete-snapshots=true. Streaming Writes To write values from streaming query to Iceberg table, use DataStreamWriter:\nval tableIdentifier: String = ... data.writeStream .format(\"iceberg\") .outputMode(\"append\") .trigger(Trigger.ProcessingTime(1, TimeUnit.MINUTES)) .option(\"path\", tableIdentifier) .option(\"checkpointLocation\", checkpointPath) .start() The tableIdentifier can be:\nThe fully-qualified path to a HDFS table, like hdfs://nn:8020/path/to/table A table name if the table is tracked by a catalog, like database.table_name Iceberg doesn\u2019t support \u201ccontinuous processing\u201d, as it doesn\u2019t provide the interface to \u201ccommit\u201d the output.\nIceberg supports append and complete output modes:\nappend: appends the rows of every micro-batch to the table complete: replaces the table contents every micro-batch The table should be created in prior to start the streaming query. Refer SQL create table on Spark page to see how to create the Iceberg table.\nWriting against partitioned table Iceberg requires the data to be sorted according to the partition spec per task (Spark partition) in prior to write against partitioned table. For batch queries you\u2019re encouraged to do explicit sort to fulfill the requirement (see here), but the approach would bring additional latency as repartition and sort are considered as heavy operations for streaming workload. To avoid additional latency, you can enable fanout writer to eliminate the requirement.\nval tableIdentifier: String = ... data.writeStream .format(\"iceberg\") .outputMode(\"append\") .trigger(Trigger.ProcessingTime(1, TimeUnit.MINUTES)) .option(\"path\", tableIdentifier) .option(\"fanout-enabled\", \"true\") .option(\"checkpointLocation\", checkpointPath) .start() Fanout writer opens the files per partition value and doesn\u2019t close these files till write task is finished. This functionality is discouraged for batch query, as explicit sort against output rows isn\u2019t expensive for batch workload.\nMaintenance for streaming tables Streaming queries can create new table versions quickly, which creates lots of table metadata to track those versions. Maintaining metadata by tuning the rate of commits, expiring old snapshots, and automatically cleaning up metadata files is highly recommended.\nTune the rate of commits Having high rate of commits would produce lots of data files, manifests, and snapshots which leads the table hard to maintain. We encourage having trigger interval 1 minute at minimum, and increase the interval if needed.\nThe triggers section in Structured Streaming Programming Guide documents how to configure the interval.\nExpire old snapshots Each micro-batch written to a table produces a new snapshot, which are tracked in table metadata until they are expired to remove the metadata and any data files that are no longer needed. Snapshots accumulate quickly with frequent commits, so it is highly recommended that tables written by streaming queries are regularly maintained.\nCompacting data files The amount of data written in a micro batch is typically small, which can cause the table metadata to track lots of small files. Compacting small files into larger files reduces the metadata needed by the table, and increases query efficiency.\nRewrite manifests To optimize write latency on streaming workload, Iceberg may write the new snapshot with a \u201cfast\u201d append that does not automatically compact manifests. This could lead lots of small manifest files. Manifests can be rewritten to optimize queries and to compact.\n", "description": "", "title": "Structured Streaming", "uri": "/docs/latest/spark-structured-streaming/"}, {"categories": null, "content": " Spark Writes To use Iceberg in Spark, first configure Spark catalogs.\nSome plans are only available when using Iceberg SQL extensions in Spark 3.x.\nIceberg uses Apache Spark\u2019s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions:\nFeature support Spark 3.0 Spark 2.4 Notes SQL insert into \u2714\ufe0f SQL merge into \u2714\ufe0f \u26a0 Requires Iceberg Spark extensions SQL insert overwrite \u2714\ufe0f SQL delete from \u2714\ufe0f \u26a0 Row-level delete requires Spark extensions SQL update \u2714\ufe0f \u26a0 Requires Iceberg Spark extensions DataFrame append \u2714\ufe0f \u2714\ufe0f DataFrame overwrite \u2714\ufe0f \u2714\ufe0f \u26a0 Behavior changed in Spark 3.0 DataFrame CTAS and RTAS \u2714\ufe0f Writing with SQL Spark 3 supports SQL INSERT INTO, MERGE INTO, and INSERT OVERWRITE, as well as the new DataFrameWriterV2 API.\nINSERT INTO To append new data to a table, use INSERT INTO.\nINSERT INTO prod.db.table VALUES (1, 'a'), (2, 'b') INSERT INTO prod.db.table SELECT ... MERGE INTO Spark 3 added support for MERGE INTO queries that can express row-level updates.\nIceberg supports MERGE INTO by rewriting data files that contain rows that need to be updated in an overwrite commit.\nMERGE INTO is recommended instead of INSERT OVERWRITE because Iceberg can replace only the affected data files, and because the data overwritten by a dynamic overwrite may change if the table\u2019s partitioning changes.\nMERGE INTO syntax MERGE INTO updates a table, called the target table, using a set of updates from another query, called the source. The update for a row in the target table is found using the ON clause that is like a join condition.\nMERGE INTO prod.db.target t -- a target table USING (SELECT ...) s -- the source updates ON t.id = s.id -- condition to find updates for target rows WHEN ... -- updates Updates to rows in the target table are listed using WHEN MATCHED ... THEN .... Multiple MATCHED clauses can be added with conditions that determine when each match should be applied. The first matching expression is used.\nWHEN MATCHED AND s.op = 'delete' THEN DELETE WHEN MATCHED AND t.count IS NULL AND s.op = 'increment' THEN UPDATE SET t.count = 0 WHEN MATCHED AND s.op = 'increment' THEN UPDATE SET t.count = t.count + 1 Source rows (updates) that do not match can be inserted:\nWHEN NOT MATCHED THEN INSERT * Inserts also support additional conditions:\nWHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count) VALUES (s.id, 1) Only one record in the source data can update any given row of the target table, or else an error will be thrown.\nINSERT OVERWRITE INSERT OVERWRITE can replace data in the table with the result of a query. Overwrites are atomic operations for Iceberg tables.\nThe partitions that will be replaced by INSERT OVERWRITE depends on Spark\u2019s partition overwrite mode and the partitioning of a table. MERGE INTO can rewrite only affected data files and has more easily understood behavior, so it is recommended instead of INSERT OVERWRITE.\nSpark 3.0.0 has a correctness bug that affects dynamic INSERT OVERWRITE with hidden partitioning, [SPARK-32168][spark-32168]. For tables with hidden partitions, make sure you use Spark 3.0.1. Overwrite behavior Spark\u2019s default overwrite mode is static, but dynamic overwrite mode is recommended when writing to Iceberg tables. Static overwrite mode determines which partitions to overwrite in a table by converting the PARTITION clause to a filter, but the PARTITION clause can only reference table columns.\nDynamic overwrite mode is configured by setting spark.sql.sources.partitionOverwriteMode=dynamic.\nTo demonstrate the behavior of dynamic and static overwrites, consider a logs table defined by the following DDL:\nCREATE TABLE prod.my_app.logs ( uuid string NOT NULL, level string NOT NULL, ts timestamp NOT NULL, message string) USING iceberg PARTITIONED BY (level, hours(ts)) Dynamic overwrite When Spark\u2019s overwrite mode is dynamic, partitions that have rows produced by the SELECT query will be replaced.\nFor example, this query removes duplicate log events from the example logs table.\nINSERT OVERWRITE prod.my_app.logs SELECT uuid, first(level), first(ts), first(message) FROM prod.my_app.logs WHERE cast(ts as date) = '2020-07-01' GROUP BY uuid In dynamic mode, this will replace any partition with rows in the SELECT result. Because the date of all rows is restricted to 1 July, only hours of that day will be replaced.\nStatic overwrite When Spark\u2019s overwrite mode is static, the PARTITION clause is converted to a filter that is used to delete from the table. If the PARTITION clause is omitted, all partitions will be replaced.\nBecause there is no PARTITION clause in the query above, it will drop all existing rows in the table when run in static mode, but will only write the logs from 1 July.\nTo overwrite just the partitions that were loaded, add a PARTITION clause that aligns with the SELECT query filter:\nINSERT OVERWRITE prod.my_app.logs PARTITION (level = 'INFO') SELECT uuid, first(level), first(ts), first(message) FROM prod.my_app.logs WHERE level = 'INFO' GROUP BY uuid Note that this mode cannot replace hourly partitions like the dynamic example query because the PARTITION clause can only reference table columns, not hidden partitions.\nDELETE FROM Spark 3 added support for DELETE FROM queries to remove data from tables.\nDelete queries accept a filter to match rows to delete.\nDELETE FROM prod.db.table WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00' DELETE FROM prod.db.all_events WHERE session_time < (SELECT min(session_time) FROM prod.db.good_events) DELETE FROM prod.db.orders AS t1 WHERE EXISTS (SELECT oid FROM prod.db.returned_orders WHERE t1.oid = oid) If the delete filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.\nUPDATE Spark 3.1 added support for UPDATE queries that update matching rows in tables.\nUpdate queries accept a filter to match rows to update.\nUPDATE prod.db.table SET c1 = 'update_c1', c2 = 'update_c2' WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00' UPDATE prod.db.all_events SET session_time = 0, ignored = true WHERE session_time < (SELECT min(session_time) FROM prod.db.good_events) UPDATE prod.db.orders AS t1 SET order_status = 'returned' WHERE EXISTS (SELECT oid FROM prod.db.returned_orders WHERE t1.oid = oid) For more complex row-level updates based on incoming data, see the section on MERGE INTO.\nWriting with DataFrames Spark 3 introduced the new DataFrameWriterV2 API for writing to tables using data frames. The v2 API is recommended for several reasons:\nCTAS, RTAS, and overwrite by filter are supported All operations consistently write columns to a table by name Hidden partition expressions are supported in partitionedBy Overwrite behavior is explicit, either dynamic or by a user-supplied filter The behavior of each operation corresponds to SQL statements df.writeTo(t).create() is equivalent to CREATE TABLE AS SELECT df.writeTo(t).replace() is equivalent to REPLACE TABLE AS SELECT df.writeTo(t).append() is equivalent to INSERT INTO df.writeTo(t).overwritePartitions() is equivalent to dynamic INSERT OVERWRITE The v1 DataFrame write API is still supported, but is not recommended.\nWhen writing with the v1 DataFrame API in Spark 3, use saveAsTable or insertInto to load tables with a catalog. Using format(\"iceberg\") loads an isolated table reference that will not automatically refresh tables used by queries. Appending data To append a dataframe to an Iceberg table, use append:\nval data: DataFrame = ... data.writeTo(\"prod.db.table\").append() Spark 2.4 In Spark 2.4, use the v1 API with append mode and iceberg format:\ndata.write .format(\"iceberg\") .mode(\"append\") .save(\"db.table\") Overwriting data To overwrite partitions dynamically, use overwritePartitions():\nval data: DataFrame = ... data.writeTo(\"prod.db.table\").overwritePartitions() To explicitly overwrite partitions, use overwrite to supply a filter:\ndata.writeTo(\"prod.db.table\").overwrite($\"level\" === \"INFO\") Spark 2.4 In Spark 2.4, overwrite values in an Iceberg table with overwrite mode and iceberg format:\ndata.write .format(\"iceberg\") .mode(\"overwrite\") .save(\"db.table\") The behavior of overwrite mode changed between Spark 2.4 and Spark 3. The behavior of DataFrameWriter overwrite mode was undefined in Spark 2.4, but is required to overwrite the entire table in Spark 3. Because of this new requirement, the Iceberg source\u2019s behavior changed in Spark 3. In Spark 2.4, the behavior was to dynamically overwrite partitions. To use the Spark 2.4 behavior, add option overwrite-mode=dynamic.\nCreating tables To run a CTAS or RTAS, use create, replace, or createOrReplace operations:\nval data: DataFrame = ... data.writeTo(\"prod.db.table\").create() If you have replaced the default Spark catalog (spark_catalog) with Iceberg\u2019s SparkSessionCatalog, do:\nval data: DataFrame = ... data.writeTo(\"db.table\").using(\"iceberg\").create() Create and replace operations support table configuration methods, like partitionedBy and tableProperty:\ndata.writeTo(\"prod.db.table\") .tableProperty(\"write.format.default\", \"orc\") .partitionedBy($\"level\", days($\"ts\")) .createOrReplace() Writing to partitioned tables Iceberg requires the data to be sorted according to the partition spec per task (Spark partition) in prior to write against partitioned table. This applies both Writing with SQL and Writing with DataFrames.\nExplicit sort is necessary because Spark doesn\u2019t allow Iceberg to request a sort before writing as of Spark 3.0. SPARK-23889 is filed to enable Iceberg to require specific distribution & sort order to Spark. Both global sort (orderBy/sort) and local sort (sortWithinPartitions) work for the requirement. Let\u2019s go through writing the data against below sample table:\nCREATE TABLE prod.db.sample ( id bigint, data string, category string, ts timestamp) USING iceberg PARTITIONED BY (days(ts), category) To write data to the sample table, your data needs to be sorted by days(ts), category.\nIf you\u2019re inserting data with SQL statement, you can use ORDER BY to achieve it, like below:\nINSERT INTO prod.db.sample SELECT id, data, category, ts FROM another_table ORDER BY ts, category If you\u2019re inserting data with DataFrame, you can use either orderBy/sort to trigger global sort, or sortWithinPartitions to trigger local sort. Local sort for example:\ndata.sortWithinPartitions(\"ts\", \"category\") .writeTo(\"prod.db.sample\") .append() You can simply add the original column to the sort condition for the most partition transformations, except bucket.\nFor bucket partition transformation, you need to register the Iceberg transform function in Spark to specify it during sort.\nLet\u2019s go through another sample table having bucket partition:\nCREATE TABLE prod.db.sample ( id bigint, data string, category string, ts timestamp) USING iceberg PARTITIONED BY (bucket(16, id)) You need to register the function to deal with bucket, like below:\nimport org.apache.iceberg.spark.IcebergSpark import org.apache.spark.sql.types.DataTypes IcebergSpark.registerBucketUDF(spark, \"iceberg_bucket16\", DataTypes.LongType, 16) Explicit registration of the function is necessary because Spark doesn\u2019t allow Iceberg to provide functions. SPARK-27658 is filed to enable Iceberg to provide functions which can be used in query. Here we just registered the bucket function as iceberg_bucket16, which can be used in sort clause.\nIf you\u2019re inserting data with SQL statement, you can use the function like below:\nINSERT INTO prod.db.sample SELECT id, data, category, ts FROM another_table ORDER BY iceberg_bucket16(id) If you\u2019re inserting data with DataFrame, you can use the function like below:\ndata.sortWithinPartitions(expr(\"iceberg_bucket16(id)\")) .writeTo(\"prod.db.sample\") .append() Type compatibility Spark and Iceberg support different set of types. Iceberg does the type conversion automatically, but not for all combinations, so you may want to understand the type conversion in Iceberg in prior to design the types of columns in your tables.\nSpark type to Iceberg type This type conversion table describes how Spark types are converted to the Iceberg types. The conversion applies on both creating Iceberg table and writing to Iceberg table via Spark.\nSpark Iceberg Notes boolean boolean short integer byte integer integer integer long long float float double double date date timestamp timestamp with timezone char string varchar string string string binary binary decimal decimal struct struct array list map map The table is based on representing conversion during creating table. In fact, broader supports are applied on write. Here\u2019re some points on write:\nIceberg numeric types (integer, long, float, double, decimal) support promotion during writes. e.g. You can write Spark types short, byte, integer, long to Iceberg type long. You can write to Iceberg fixed type using Spark binary type. Note that assertion on the length will be performed. Iceberg type to Spark type This type conversion table describes how Iceberg types are converted to the Spark types. The conversion applies on reading from Iceberg table via Spark.\nIceberg Spark Note boolean boolean integer integer long long float float double double date date time Not supported timestamp with timezone timestamp timestamp without timezone Not supported string string uuid string fixed binary binary binary decimal decimal struct struct list array map map ", "description": "", "title": "Writes", "uri": "/docs/latest/spark-writes/"}]