blob: cdad1fb17f3709d3b38981fb23482c767a03072c [file] [log] [blame]
[{"categories":null,"content":" Hive and Iceberg Quickstart This guide will get you up and running with an Iceberg and Hive environment, including sample code to highlight some powerful features. You can learn more about Iceberg’s Hive runtime by checking out the Hive section.\nDocker Images Creating a Table Writing Data to a Table Reading Data from a Table Next Steps Docker Images The fastest way to get started is to use Apache Hive images which provides a SQL-like interface to create and query Iceberg tables from your laptop. You need to install the Docker Desktop.\nTake a look at the Tags tab in Apache Hive docker images to see the available Hive versions.\nSet the version variable.\nexport HIVE_VERSION=4.0.0-beta-1 Start the container, using the option --platform linux/amd64 for a Mac with an M-Series chip:\ndocker run -d --platform linux/amd64 -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION} The docker run command above configures Hive to use the embedded derby database for Hive Metastore. Hive Metastore functions as the Iceberg catalog to locate Iceberg files, which can be anywhere.\nGive HiveServer (HS2) a little time to come up in the docker container, and then start the Hive Beeline client using the following command to connect with the HS2 containers you already started:\ndocker exec -it hive4 beeline -u 'jdbc:hive2://localhost:10000/' The hive prompt appears:\n0: jdbc:hive2://localhost:10000\u003e You can now run SQL queries to create Iceberg tables and query the tables.\nshow databases; Creating a Table To create your first Iceberg table in Hive, run a CREATE TABLE command. Let’s create a table using nyc.taxis where nyc is the database name and taxis is the table name.\nCREATE DATABASE nyc; CREATE TABLE nyc.taxis ( trip_id bigint, trip_distance float, fare_amount double, store_and_fwd_flag string ) PARTITIONED BY (vendor_id bigint) STORED BY ICEBERG; Iceberg catalogs support the full range of SQL DDL commands, including:\nCREATE TABLE CREATE TABLE AS SELECT CREATE TABLE LIKE TABLE ALTER TABLE DROP TABLE Writing Data to a Table After your table is created, you can insert records.\nINSERT INTO nyc.taxis VALUES (1000371, 1.8, 15.32, 'N', 1), (1000372, 2.5, 22.15, 'N', 2), (1000373, 0.9, 9.01, 'N', 2), (1000374, 8.4, 42.13, 'Y', 1); Reading Data from a Table To read a table, simply use the Iceberg table’s name.\nSELECT * FROM nyc.taxis; Next steps Adding Iceberg to Hive If you already have a Hive 4.0.0-alpha-1, or later, environment, it comes with the Iceberg 0.13.1 included. No additional downloads or jars are needed. If you have a Hive 2.3.x or Hive 3.1.x environment see Enabling Iceberg support in Hive.\nLearn More To learn more about setting up a database other than Derby, see Apache Hive Quick Start. You can also set up a standalone metastore, HS2 and Postgres. Now that you’re up and running with Iceberg and Hive, check out the Iceberg-Hive docs to learn more!\n","description":"","title":"Hive and Iceberg Quickstart","uri":"/hive-quickstart/"},{"categories":null,"content":" Spark and Iceberg Quickstart This guide will get you up and running with an Iceberg and Spark environment, including sample code to highlight some powerful features. You can learn more about Iceberg’s Spark runtime by checking out the Spark section.\nDocker-Compose Creating a table Writing Data to a Table Reading Data from a Table Adding A Catalog Next Steps Docker-Compose The fastest way to get started is to use a docker-compose file that uses the tabulario/spark-iceberg image which contains a local Spark cluster with a configured Iceberg catalog. To use this, you’ll need to install the Docker CLI as well as the Docker Compose CLI.\nOnce you have those, save the yaml below into a file named docker-compose.yml:\nversion: \"3\" services: spark-iceberg: image: tabulario/spark-iceberg container_name: spark-iceberg build: spark/ networks: iceberg_net: depends_on: - rest - minio volumes: - ./warehouse:/home/iceberg/warehouse - ./notebooks:/home/iceberg/notebooks/notebooks environment: - AWS_ACCESS_KEY_ID=admin - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east-1 ports: - 8888:8888 - 8080:8080 - 10000:10000 - 10001:10001 rest: image: tabulario/iceberg-rest container_name: iceberg-rest networks: iceberg_net: ports: - 8181:8181 environment: - AWS_ACCESS_KEY_ID=admin - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east-1 - CATALOG_WAREHOUSE=s3://warehouse/ - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO - CATALOG_S3_ENDPOINT=http://minio:9000 minio: image: minio/minio container_name: minio environment: - MINIO_ROOT_USER=admin - MINIO_ROOT_PASSWORD=password - MINIO_DOMAIN=minio networks: iceberg_net: aliases: - warehouse.minio ports: - 9001:9001 - 9000:9000 command: [\"server\", \"/data\", \"--console-address\", \":9001\"] mc: depends_on: - minio image: minio/mc container_name: mc networks: iceberg_net: environment: - AWS_ACCESS_KEY_ID=admin - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east-1 entrypoint: \u003e /bin/sh -c \" until (/usr/bin/mc config host add minio http://minio:9000 admin password) do echo '...waiting...' \u0026\u0026 sleep 1; done; /usr/bin/mc rm -r --force minio/warehouse; /usr/bin/mc mb minio/warehouse; /usr/bin/mc policy set public minio/warehouse; tail -f /dev/null \" networks: iceberg_net: Next, start up the docker containers with this command:\ndocker-compose up You can then run any of the following commands to start a Spark session.\nSparkSQL Spark-Shell PySpark docker exec -it spark-iceberg spark-sql docker exec -it spark-iceberg spark-shell docker exec -it spark-iceberg pyspark You can also launch a notebook server by running docker exec -it spark-iceberg notebook. The notebook server will be available at http://localhost:8888 Creating a table To create your first Iceberg table in Spark, run a CREATE TABLE command. Let’s create a table using demo.nyc.taxis where demo is the catalog name, nyc is the database name, and taxis is the table name.\nSparkSQL Spark-Shell PySpark CREATE TABLE demo.nyc.taxis ( vendor_id bigint, trip_id bigint, trip_distance float, fare_amount double, store_and_fwd_flag string ) PARTITIONED BY (vendor_id); import org.apache.spark.sql.types._ import org.apache.spark.sql.Row val schema = StructType( Array( StructField(\"vendor_id\", LongType,true), StructField(\"trip_id\", LongType,true), StructField(\"trip_distance\", FloatType,true), StructField(\"fare_amount\", DoubleType,true), StructField(\"store_and_fwd_flag\", StringType,true) )) val df = spark.createDataFrame(spark.sparkContext.emptyRDD[Row],schema) df.writeTo(\"demo.nyc.taxis\").create() from pyspark.sql.types import DoubleType, FloatType, LongType, StructType,StructField, StringType schema = StructType([ StructField(\"vendor_id\", LongType(), True), StructField(\"trip_id\", LongType(), True), StructField(\"trip_distance\", FloatType(), True), StructField(\"fare_amount\", DoubleType(), True), StructField(\"store_and_fwd_flag\", StringType(), True) ]) df = spark.createDataFrame([], schema) df.writeTo(\"demo.nyc.taxis\").create() Iceberg catalogs support the full range of SQL DDL commands, including:\nCREATE TABLE ... PARTITIONED BY CREATE TABLE ... AS SELECT ALTER TABLE DROP TABLE Writing Data to a Table Once your table is created, you can insert records.\nSparkSQL Spark-Shell PySpark INSERT INTO demo.nyc.taxis VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y'); import org.apache.spark.sql.Row val schema = spark.table(\"demo.nyc.taxis\").schema val data = Seq( Row(1: Long, 1000371: Long, 1.8f: Float, 15.32: Double, \"N\": String), Row(2: Long, 1000372: Long, 2.5f: Float, 22.15: Double, \"N\": String), Row(2: Long, 1000373: Long, 0.9f: Float, 9.01: Double, \"N\": String), Row(1: Long, 1000374: Long, 8.4f: Float, 42.13: Double, \"Y\": String) ) val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.writeTo(\"demo.nyc.taxis\").append() schema = spark.table(\"demo.nyc.taxis\").schema data = [ (1, 1000371, 1.8, 15.32, \"N\"), (2, 1000372, 2.5, 22.15, \"N\"), (2, 1000373, 0.9, 9.01, \"N\"), (1, 1000374, 8.4, 42.13, \"Y\") ] df = spark.createDataFrame(data, schema) df.writeTo(\"demo.nyc.taxis\").append() Reading Data from a Table To read a table, simply use the Iceberg table’s name.\nSparkSQL Spark-Shell PySpark SELECT * FROM demo.nyc.taxis; val df = spark.table(\"demo.nyc.taxis\").show() df = spark.table(\"demo.nyc.taxis\").show() Adding A Catalog Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. Catalogs are configured using properties under spark.sql.catalog.(catalog_name). In this guide, we use JDBC, but you can follow these instructions to configure other catalog types. To learn more, check out the Catalog page in the Spark section.\nThis configuration creates a path-based catalog named local for tables under $PWD/warehouse and adds support for Iceberg tables to Spark’s built-in catalog.\nCLI spark-defaults.conf spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.4.3\\ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \\ --conf spark.sql.catalog.spark_catalog.type=hive \\ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.local.type=hadoop \\ --conf spark.sql.catalog.local.warehouse=$PWD/warehouse \\ --conf spark.sql.defaultCatalog=local spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.4.3 spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog spark.sql.catalog.spark_catalog.type hive spark.sql.catalog.local org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.local.type hadoop spark.sql.catalog.local.warehouse $PWD/warehouse spark.sql.defaultCatalog local If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing USE local; Next steps Adding Iceberg to Spark If you already have a Spark environment, you can add Iceberg, using the --packages option.\nSparkSQL Spark-Shell PySpark spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.4.3 spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.4.3 pyspark --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.4.3 If you want to include Iceberg in your Spark installation, add the Iceberg Spark runtime to Spark’s jars folder. You can download the runtime by visiting to the Releases page. Learn More Now that you’re up an running with Iceberg and Spark, check out the Iceberg-Spark docs to learn more!\n","description":"","title":"Spark and Iceberg Quickstart","uri":"/spark-quickstart/"},{"categories":null,"content":" Downloads The latest version of Iceberg is 1.4.3.\n1.4.3 source tar.gz – signature – sha512 1.4.3 Spark 3.5_2.12 runtime Jar – 3.5_2.13 1.4.3 Spark 3.4_2.12 runtime Jar – 3.4_2.13 1.4.3 Spark 3.3_2.12 runtime Jar – 3.3_2.13 1.4.3 Spark 3.2_2.12 runtime Jar – 3.2_2.13 1.4.3 Flink 1.17 runtime Jar 1.4.3 Flink 1.16 runtime Jar 1.4.3 Flink 1.15 runtime Jar 1.4.3 Hive runtime Jar 1.4.3 aws-bundle Jar 1.4.3 gcp-bundle Jar 1.4.3 azure-bundle Jar To use Iceberg in Spark or Flink, download the runtime JAR for your engine version and add it to the jars folder of your installation.\nTo use Iceberg in Hive 2 or Hive 3, download the Hive runtime JAR and add it to Hive using ADD JAR.\nGradle To add a dependency on Iceberg in Gradle, add the following to build.gradle:\ndependencies { compile 'org.apache.iceberg:iceberg-core:1.4.3' } You may also want to include iceberg-parquet for Parquet file support.\nMaven To add a dependency on Iceberg in Maven, add the following to your pom.xml:\n\u003cdependencies\u003e ... \u003cdependency\u003e \u003cgroupId\u003eorg.apache.iceberg\u003c/groupId\u003e \u003cartifactId\u003eiceberg-core\u003c/artifactId\u003e \u003cversion\u003e1.4.3\u003c/version\u003e \u003c/dependency\u003e ... \u003c/dependencies\u003e 1.4.3 Release Apache Iceberg 1.4.3 was released on December 27, 2023. The main issue it solves is missing files from a transaction retry with conflicting manifests. It is recommended to upgrade if you use transactions.\nCore: Scan only live entries in partitions table (#8969) by @Fokko in #9197 Core: Fix missing files from transaction retries with conflicting manifest merges by @nastra in #9337 JDBC Catalog: Fix namespaceExists check with special characters by @ismailsimsek in #9291 Core: Expired Snapshot files in a transaction should be deleted by @bartash in #9223 Core: Fix missing delete files from transaction by @nastra in #9356 Past releases 1.4.2 Release Apache Iceberg 1.4.2 was released on November 2, 2023. The 1.4.2 patch release addresses fixing a remaining case where split offsets should be ignored when they are deemed invalid.\nCore Core: Ignore split offsets array when split offset is past file length (#8925) 1.4.1 Release Apache Iceberg 1.4.1 was released on October 23, 2023. The 1.4.1 release addresses various issues identified in the 1.4.0 release.\nCore Core: Do not use a lazy split offset list in manifests (#8834) Core: Ignore split offsets when the last split offset is past the file length (#8860) AWS Avoid static global credentials provider which doesn’t play well with lifecycle management (#8677) Flink Reverting the default custom partitioner for bucket column (#8848) 1.4.0 release Apache Iceberg 1.4.0 was released on October 4, 2023. The 1.4.0 release adds a variety of new features and bug fixes.\nAPI Implement bound expression sanitization (#8149) Remove overflow checks in DefaultCounter causing performance issues (#8297) Support incremental scanning with branch (#5984) Add a validation API to DeleteFiles which validates files exist (#8525) Core Use V2 format by default in new tables (#8381) Use zstd compression for Parquet by default in new tables (#8593) Add strict metadata cleanup mode and enable it by default (#8397) (#8599) Avoid generating huge manifests during commits (#6335) Add a writer for unordered position deletes (#7692) Optimize DeleteFileIndex (#8157) Optimize lookup in DeleteFileIndex without useful bounds (#8278) Optimize split offsets handling (#8336) Optimize computing user-facing state in data tasks (#8346) Don’t persist useless file and position bounds for deletes (#8360) Don’t persist counts for paths and positions in position delete files (#8590) Support setting system-level properties via environmental variables (#5659) Add JSON parser for ContentFile and FileScanTask (#6934) Add REST spec and request for commits to multiple tables (#7741) Add REST API for committing changes against multiple tables (#7569) Default to exponential retry strategy in REST client (#8366) Support registering tables with REST session catalog (#6512) Add last updated timestamp and snapshot ID to partitions metadata table (#7581) Add total data size to partitions metadata table (#7920) Extend ResolvingFileIO to support bulk operations (#7976) Key metadata in Avro format (#6450) Add AES GCM encryption stream (#3231) Fix a connection leak in streaming delete filters (#8132) Fix lazy snapshot loading history (#8470) Fix unicode handling in HTTPClient (#8046) Fix paths for unpartitioned specs in writers (#7685) Fix OOM caused by Avro decoder caching (#7791) Spark Added support for Spark 3.5 Code for DELETE, UPDATE, and MERGE commands has moved to Spark, and all related extensions have been dropped from Iceberg. Support for WHEN NOT MATCHED BY SOURCE clause in MERGE. Column pruning in merge-on-read operations. Ability to request a bigger advisory partition size for the final write to produce well-sized output files without harming the job parallelism. Dropped support for Spark 3.1 Deprecated support for Spark 3.2 Support vectorized reads for merge-on-read operations in Spark 3.4 and 3.5 (#8466) Increase default advisory partition size for writes in Spark 3.5 (#8660) Support distributed planning in Spark 3.4 and 3.5 (#8123) Support pushing down system functions by V2 filters in Spark 3.4 and 3.5 (#7886) Support fanout position delta writers in Spark 3.4 and 3.5 (#7703) Use fanout writers for unsorted tables by default in Spark 3.5 (#8621) Support multiple shuffle partitions per file in compaction in Spark 3.4 and 3.5 (#7897) Output net changes across snapshots for carryover rows in CDC (#7326) Display read metrics on Spark SQL UI (#7447) (#8445) Adjust split size to benefit from cluster parallelism in Spark 3.4 and 3.5 (#7714) Add fast_forward procedure (#8081) Support filters when rewriting position deletes (#7582) Support setting current snapshot with ref (#8163) Make backup table name configurable during migration (#8227) Add write and SQL options to override compression config (#8313) Correct partition transform functions to match the spec (#8192) Enable extra commit properties with metadata delete (#7649) Flink Add possibility of ordering the splits based on the file sequence number (#7661) Fix serialization in TableSink with anonymous object (#7866) Switch to FileScanTaskParser for JSON serialization of IcebergSourceSplit (#7978) Custom partitioner for bucket partitions (#7161) Implement data statistics coordinator to aggregate data statistics from operator subtasks (#7360) Support alter table column (#7628) Parquet Add encryption config to read and write builders (#2639) Skip writing bloom filters for deletes (#7617) Cache codecs by name and level (#8182) Fix decimal data reading from ParquetAvroValueReaders (#8246) Handle filters with transforms by assuming data must be scanned (#8243) ORC Handle filters with transforms by assuming the filter matches (#8244) Vendor Integrations GCP: Fix single byte read in GCSInputStream (#8071) GCP: Add properties for OAtuh2 and update library (#8073) GCP: Add prefix and bulk operations to GCSFileIO (#8168) GCP: Add bundle jar for GCP-related dependencies (#8231) GCP: Add range reads to GCSInputStream (#8301) AWS: Add bundle jar for AWS-related dependencies (#8261) AWS: support config storage class for S3FileIO (#8154) AWS: Add FileIO tracker/closer to Glue catalog (#8315) AWS: Update S3 signer spec to allow an optional string body in S3SignRequest (#8361) Azure: Add FileIO that supports ADLSv2 storage (#8303) Azure: Make ADLSFileIO implement DelegateFileIO (#8563) Nessie: Provide better commit message on table registration (#8385) Dependencies Bump Nessie to 0.71.0 Bump ORC to 1.9.1 Bump Arrow to 12.0.1 Bump AWS Java SDK to 2.20.131 1.3.1 release Apache Iceberg 1.3.1 was released on July 25, 2023. The 1.3.1 release addresses various issues identified in the 1.3.0 release.\nCore Table Metadata parser now accepts null for fields: current-snapshot-id, properties, and snapshots (#8064) Hive Fix HiveCatalog deleting metadata on failures in checking lock status (#7931) Spark Fix RewritePositionDeleteFiles failure for certain partition types (#8059) Fix RewriteDataFiles concurrency edge-case on commit timeouts (#7933) Fix partition-level DELETE operations for WAP branches (#7900) Flink FlinkCatalog creation no longer creates the default database (#8039) 1.3.0 release Apache Iceberg 1.3.0 was released on May 30th, 2023. The 1.3.0 release adds a variety of new features and bug fixes.\nCore Expose file and data sequence numbers in ContentFile (#7555) Improve bit density in object storage layout (#7128) Store split offsets for delete files (#7011) Readable metrics in entries metadata table (#7539) Delete file stats in partitions metadata table (#6661) Optimized vectorized reads for Parquet Decimal (#3249) Vectorized reads for Parquet INT96 timestamps in imported data (#6962) Support selected vector with ORC row and batch readers (#7197) Clean up expired metastore clients (#7310) Support for deleting old partition spec columns in V1 tables (#7398) Spark Initial support for Spark 3.4 Removed integration for Spark 2.4 Support for storage-partitioned joins with mismatching keys in Spark 3.4 (MERGE commands) (#7424) Support for TimestampNTZ in Spark 3.4 (#7553) Ability to handle skew during writes in Spark 3.4 (#7520) Ability to coalesce small tasks during writes in Spark 3.4 (#7532) Distribution and ordering enhancements in Spark 3.4 (#7637) Action for rewriting position deletes (#7389) Procedure for rewriting position deletes (#7572) Avoid local sort for MERGE cardinality check (#7558) Support for rate limits in Structured Streaming (#4479) Read and write support for UUIDs (#7399) Concurrent compaction is enabled by default (#6907) Support for metadata columns in changelog tables (#7152) Add file group failure info for data compaction (#7361) Flink Initial support for Flink 1.17 Removed integration for Flink 1.14 Data statistics operator to collect traffic distribution for guiding smart shuffling (#6382) Data statistics operator sends local data statistics to coordinator and receives aggregated data statistics from coordinator for smart shuffling (#7269) Exposed write parallelism in SQL hints (#7039) Row-level filtering (#7109) Use starting sequence number by default when rewriting data files (#7218) Config for max allowed consecutive planning failures in IcebergSource before failing the job (#7571) Vendor Integrations AWS: Use Apache HTTP client as default AWS HTTP client (#7119) AWS: Prevent token refresh scheduling on every sign request (#7270) AWS: Disable local credentials if remote signing is enabled (#7230) Dependencies Bump Arrow to 12.0.0 Bump ORC to 1.8.3 Bump Parquet to 1.13.1 Bump Nessie to 0.59.0 1.2.1 release Apache Iceberg 1.2.1 was released on April 11th, 2023. The 1.2.1 release is a patch release to address various issues identified in the prior release. Here is an overview:\nCORE REST: fix previous locations for refs-only load #7284 Parse snapshot-id as long in remove-statistics update #7235 Spark Broadcast table instead of file IO in rewrite manifests #7263 Revert “Spark: Add “Iceberg” prefix to SparkTable name string for SparkUI #7273 AWS Make AuthSession cache static #7289 Abort S3 input stream on close if not EOS #7262 Disable local credentials if remote signing is enabled #7230 Prevent token refresh scheduling on every sign request #7270 S3 Credentials provider support in DefaultAwsClientFactory #7066 1.2.0 release Apache Iceberg 1.2.0 was released on March 20th, 2023. The 1.2.0 release adds a variety of new features and bug fixes. Here is an overview:\nCore Added AES GCM encrpytion stream spec (#5432) Added support for Delta Lake to Iceberg table conversion (#6449, #6880) Added support for position_deletes metadata table (#6365, #6716) Added support for scan and commit metrics reporter that is pluggable through catalog (#6404, #6246, #6410) Added support for branch commit for all operations (#4926, #5010) Added FileIO support for ORC readers and writers (#6293) Updated all actions to leverage bulk delete whenever possible (#6682) Updated snapshot ID definition in Puffin spec to support statistics file reuse (#6272) Added human-readable metrics information in files metadata table (#5376) Fixed incorrect Parquet row group skipping when min and max values are NaN (#6517) Fixed a bug that location provider could generate paths with double slash (//) which is not compatible in a Hadoop file system (#6777) Fixed metadata table time travel failure for tables that performed schema evolution (#6980) Spark Added time range query support for changelog table (#6350) Added changelog view procedure for v1 table (#6012) Added support for storage partition joins to improve read and write performance (#6371) Updated default Arrow environment settings to improve read performance (#6550) Added aggregate pushdown support for min, max and count to improve read performance (#6622) Updated default distribution mode settings to improve write performance (#6828, #6838) Updated DELETE to perform metadata-only update whenever possible to improve write performance (#6899) Improved predicate pushdown support for write operations (#6636) Added support for reading a branch or tag through table identifier and VERSION AS OF (a.k.a. FOR SYSTEM_VERSION AS OF) SQL syntax (#6717, #6575) Added support for writing to a branch through identifier or through write-audit-publish (WAP) workflow settings (#6965, #7050) Added DDL SQL extensions to create, replace and drop a branch or tag (#6638, #6637, #6752, #6807) Added UDFs for years, months, days and hours transforms (#6207, #6261, #6300, #6339) Added partition related stats for add_files procedure result (#6797) Fixed a bug that rewrite_manifests procedure produced a new manifest even when there was no rewrite performed (#6659) Fixed a bug that statistics files were not cleaned up in expire_snapshots procedure (#6090) Flink Added support for metadata tables (#6222) Added support for read options in Flink source (#5967) Added support for reading and writing Avro GenericRecord (#6557, #6584) Added support for reading a branch or tag and write to a branch (#6660, #5029) Added throttling support for streaming read (#6299) Added support for multiple sinks for the same table in the same job (#6528) Fixed a bug that metrics config was not applied to equality and position deletes (#6271, #6313) Vendor Integrations Added Snowflake catalog integration (#6428) Added AWS sigV4 authentication support for REST catalog (#6951) Added support for AWS S3 remote signing (#6169, #6835, #7080) Updated AWS Glue catalog to skip table version archive by default (#6919) Updated AWS Glue catalog to not require a warehouse location (#6586) Fixed a bug that a bucket-only AWS S3 location such as s3://my-bucket could not be parsed (#6352) Fixed a bug that unnecessary HTTP client dependencies had to be included to use any AWS integration (#6746) Fixed a bug that AWS Glue catalog did not respect custom catalog ID when determining default warehouse location (#6223) Fixes a bug that AWS DynamoDB catalog namespace listing result was incomplete (#6823) Dependencies Upgraded ORC to 1.8.1 (#6349) Upgraded Jackson to 2.14.1 (#6168) Upgraded AWS SDK V2 to 2.20.18 (#7003) Upgraded Nessie to 0.50.0 (#6875) For more details, please visit Github.\n1.1.0 release Apache Iceberg 1.1.0 was released on November 28th, 2022. The 1.1.0 release deprecates various pre-1.0.0 methods, and adds a variety of new features. Here is an overview:\nCore Puffin statistics have been added to the Table API Support for Table scan reporting, which enables collection of statistics of the table scans. Add file sequence number to ManifestEntry Support register table for all the catalogs (previously it was only for Hive) Support performing merge appends and delete files on branches Improved Expire Snapshots FileCleanupStrategy SnapshotProducer supports branch writes Spark Support for aggregate expressions SparkChangelogTable for querying changelogs Dropped support for Apache Spark 3.0 Flink FLIP-27 reader is supported in SQL Added support for Flink 1.16, dropped support for Flink 1.13 Dependencies AWS SDK: 2.17.257 Nessie: 0.44 Apache ORC: 1.8.0 (Also, supports setting bloom filters on row groups) For more details, please visit Github.\n1.0.0 release The 1.0.0 release officially guarantees the stability of the Iceberg API.\nIceberg’s API has been largely stable since very early releases and has been integrated with many processing engines, but was still released under a 0.y.z version number indicating that breaking changes may happen. From 1.0.0 forward, the project will follow semver in the public API module, iceberg-api.\nThis release removes deprecated APIs that are no longer part of the API. To make transitioning to the new release easier, it is based on the 0.14.1 release with only important bug fixes:\nIncrease metrics limit to 100 columns (#5933) Bump Spark patch versions for CVE-2022-33891 (#5292) Exclude Scala from Spark runtime Jars (#5884) 0.14.1 release This release includes all bug fixes from the 0.14.x patch releases.\nNotable bug fixes API API: Fix ID assignment in schema merging (#5395) Core Core: Fix snapshot log with intermediate transaction snapshots (#5568) Core: Fix exception handling in BaseTaskWriter (#5683) Core: Support deleting tables without metadata files (#5510) Core: Add CommitStateUnknownException handling to REST (#5694) Spark Spark: Fix stats in rewrite metadata action (#5691) File Formats Parquet: Close zstd input stream early to avoid memory pressure (#5681) Vendor Integrations Core, AWS: Fix Kryo serialization failure for FileIO (#5437) AWS: S3OutputStream - failure to close should persist on subsequent close calls (#5311) 0.14.0 release Apache Iceberg 0.14.0 was released on 16 July 2022.\nHighlights Added several performance improvements for scan planning and Spark queries Added a common REST catalog client that uses change-based commits to resolve commit conflicts on the service side Added support for Spark 3.3, including AS OF syntax for SQL time travel queries Added support for Scala 2.13 with Spark 3.2 or later Added merge-on-read support for MERGE and UPDATE queries in Spark 3.2 or later Added support to rewrite partitions using zorder Added support for Flink 1.15 and dropped support for Flink 1.12 Added a spec and implementation for Puffin, a format for large stats and index blobs, like Theta sketches or bloom filters Added new interfaces for consuming data incrementally (both append and changelog scans) Added support for bulk operations and ranged reads to FileIO interfaces Added more metadata tables to show delete files in the metadata tree High-level features API Added IcebergBuild to expose Iceberg version and build information Added binary compatibility checking to the build (#4638, #4798) Added a new IncrementalAppendScan interface and planner implementation (#4580) Added a new IncrementalChangelogScan interface (#4870) Refactored the ScanTask hierarchy to create new task types for changelog scans (#5077) Added expression sanitizer (#4672) Added utility to check expression equivalence (#4947) Added support for serializing FileIO instances using initialization properties (#5178) Updated Snapshot methods to accept a FileIO to read metadata files, deprecated old methods (#4873) Added optional interfaces to FileIO, for batch deletes (#4052), prefix operations (#5096), and ranged reads (#4608) Core Added a common client for REST-based catalog services that uses a change-based protocol (#4320, #4319) Added Puffin, a file format for statistics and index payloads or sketches (#4944, #4537) Added snapshot references to track tags and branches (#4019) ManageSnapshots now supports multiple operations using transactions, and added branch and tag operations (#4128, #4071) ReplacePartitions and OverwriteFiles now support serializable isolation (#2925, #4052) Added new metadata tables: data_files (#4336), delete_files (#4243), all_delete_files, and all_files (#4694) Added deleted files to the files metadata table (#4336) and delete file counts to the manifests table (#4764) Added support for predicate pushdown for the all_data_files metadata table (#4382) and the all_manifests table (#4736) Added support for catalogs to default table properties on creation (#4011) Updated sort order construction to ensure all partition fields are added to avoid partition closed failures (#5131) Spark Spark 3.3 is now supported (#5056) Added SQL time travel using AS OF syntax in Spark 3.3 (#5156) Scala 2.13 is now supported for Spark 3.2 and 3.3 (#4009) Added support for the mergeSchema option for DataFrame writes (#4154) MERGE and UPDATE queries now support the lazy / merge-on-read strategy (#3984, #4047) Added zorder rewrite strategy to the rewrite_data_files stored procedure and action (#3983, #4902) Added a register_table stored procedure to create tables from metadata JSON files (#4810) Added a publish_changes stored procedure to publish staged commits by ID (#4715) Added CommitMetadata helper class to set snapshot summary properties from SQL (#4956) Added support to supply a file listing to remove orphan data files procedure and action (#4503) Added FileIO metrics to the Spark UI (#4030, #4050) DROP TABLE now supports the PURGE flag (#3056) Added support for custom isolation level for dynamic partition overwrites (#2925) and filter overwrites (#4293) Schema identifier fields are now shown in table properties (#4475) Abort cleanup now supports parallel execution (#4704) Flink Flink 1.15 is now supported (#4553) Flink 1.12 support was removed (#4551) Added a FLIP-27 source and builder to 1.14 and 1.15 (#5109) Added an option to set the monitor interval (#4887) and an option to limit the number of snapshots in a streaming read planning operation (#4943) Added support for write options, like write-format to Flink sink builder (#3998) Added support for task locality when reading from HDFS (#3817) Use Hadoop configuration files from hadoop-conf-dir property (#4622) Vendor integrations Added Dell ECS integration (#3376, #4221) JDBC catalog now supports namespace properties (#3275) AWS Glue catalog supports native Glue locking (#4166) AWS S3FileIO supports using S3 access points (#4334), bulk operations (#4052, #5096), ranged reads (#4608), and tagging at write time or in place of deletes (#4259, #4342) AWS GlueCatalog supports passing LakeFormation credentials (#4280) AWS DynamoDB catalog and lock supports overriding the DynamoDB endpoint (#4726) Nessie now supports namespaces and namespace properties (#4385, #4610) Nessie now passes most common catalog tests (#4392) Parquet Added support for row group skipping using Parquet bloom filters (#4938) Added table configuration options for writing Parquet bloom filters (#5035) ORC Support file rolling at a target file size (#4419) Support table compression settings, write.orc.compression-codec and write.orc.compression-strategy (#4273) Performance improvements Core Fixed manifest file handling in scan planning to open manifests in the planning threadpool (#5206) Avoided an extra S3 HEAD request by passing file length when opening manifest files (#5207) Refactored Arrow vectorized readers to avoid extra dictionary copies (#5137) Improved Arrow decimal handling to improve decimal performance (#5168, #5198) Added support for Avro files with Zstd compression (#4083) Column metrics are now disabled by default after the first 32 columns (#3959, #5215) Updated delete filters to copy row wrappers to avoid expensive type analysis (#5249) Snapshot expiration supports parallel execution (#4148) Manifest updates can use a custom thread pool (#4146) Spark Parquet vectorized reads are enabled by default (#4196) Scan statistics now adjust row counts for split data files (#4446) Implemented SupportsReportStatistics in ScanBuilder to work around SPARK-38962 (#5136) Updated Spark tables to avoid expensive (and inaccurate) size estimation (#5225) Flink Operators will now use a worker pool per job (#4177) Fixed ClassCastException thrown when reading arrays from Parquet (#4432) Hive Added vectorized Parquet reads for Hive 3 (#3980) Improved generic reader performance using copy instead of create (#4218) Notable bug fixes This release includes all bug fixes from the 0.13.x patch releases.\nCore Fixed an exception thrown when metadata-only deletes encounter delete files that are partially matched (#4304) Fixed transaction retries for changes without validations, like schema updates, that could ignore an update (#4464) Fixed failures when reading metadata tables with evolved partition specs (#4520, #4560) Fixed delete files dropped when a manifest is rewritten following a format version upgrade (#4514) Fixed missing metadata files resulting from an OOM during commit cleanup (#4673) Updated logging to use sanitized expressions to avoid leaking values (#4672) Spark Fixed Spark to skip calling abort when CommitStateUnknownException is thrown (#4687) Fixed MERGE commands with mixed case identifiers (#4848) Flink Fixed table property update failures when tables have a primary key (#4561) Integrations JDBC catalog behavior has been updated to pass common catalog tests (#4220, #4231) Dependency changes Updated Apache Avro to 1.10.2 (previously 1.10.1) Updated Apache Parquet to 1.12.3 (previously 1.12.2) Updated Apache ORC to 1.7.5 (previously 1.7.2) Updated Apache Arrow to 7.0.0 (previously 6.0.0) Updated AWS SDK to 2.17.131 (previously 2.15.7) Updated Nessie to 0.30.0 (previously 0.18.0) Updated Caffeine to 2.9.3 (previously 2.8.4) 0.13.2 Apache Iceberg 0.13.2 was released on June 15th, 2022.\nGit tag: 0.13.2 0.13.2 source tar.gz – signature – sha512 0.13.2 Spark 3.2 runtime Jar 0.13.2 Spark 3.1 runtime Jar 0.13.2 Spark 3.0 runtime Jar 0.13.2 Spark 2.4 runtime Jar 0.13.2 Flink 1.14 runtime Jar 0.13.2 Flink 1.13 runtime Jar 0.13.2 Flink 1.12 runtime Jar 0.13.2 Hive runtime Jar Important bug fixes and changes:\nCore #4673 fixes table corruption from OOM during commit cleanup #4514 row delta delete files were dropped in sequential commits after table format updated to v2 #4464 fixes an issue were conflicting transactions have been ignored during a commit #4520 fixes an issue with wrong table predicate filtering with evolved partition specs Spark #4663 fixes NPEs in Spark value converter #4687 fixes an issue with incorrect aborts when non-runtime exceptions were thrown in Spark Flink Note that there’s a correctness issue when using upsert mode in Flink 1.12. Given that Flink 1.12 is deprecated, it was decided to not fix this bug but rather log a warning (see also #4754). Nessie #4509 fixes a NPE that occurred when accessing refreshed tables in NessieCatalog A more exhaustive list of changes is available under the 0.13.2 release milestone.\n0.13.1 Apache Iceberg 0.13.1 was released on February 14th, 2022.\nGit tag: 0.13.1 0.13.1 source tar.gz – signature – sha512 0.13.1 Spark 3.2 runtime Jar 0.13.1 Spark 3.1 runtime Jar 0.13.1 Spark 3.0 runtime Jar 0.13.1 Spark 2.4 runtime Jar 0.13.1 Flink 1.14 runtime Jar 0.13.1 Flink 1.13 runtime Jar 0.13.1 Flink 1.12 runtime Jar 0.13.1 Hive runtime Jar Important bug fixes:\nSpark\n#4023 fixes predicate pushdown in row-level operations for merge conditions in Spark 3.2. Prior to the fix, filters would not be extracted and targeted merge conditions were not pushed down leading to degraded performance for these targeted merge operations. #4024 fixes table creation in the root namespace of a Hadoop Catalog. Flink\n#3986 fixes manifest location collisions when there are multiple committers in the same Flink job. 0.13.0 Apache Iceberg 0.13.0 was released on February 4th, 2022.\nGit tag: 0.13.0 0.13.0 source tar.gz – signature – sha512 0.13.0 Spark 3.2 runtime Jar 0.13.0 Spark 3.1 runtime Jar 0.13.0 Spark 3.0 runtime Jar 0.13.0 Spark 2.4 runtime Jar 0.13.0 Flink 1.14 runtime Jar 0.13.0 Flink 1.13 runtime Jar 0.13.0 Flink 1.12 runtime Jar 0.13.0 Hive runtime Jar High-level features:\nCore Catalog caching now supports cache expiration through catalog property cache.expiration-interval-ms [#3543] Catalog now supports registration of Iceberg table from a given metadata file location [#3851] Hadoop catalog can be used with S3 and other file systems safely by using a lock manager [#3663] Vendor Integrations Google Cloud Storage (GCS) FileIO is supported with optimized read and write using GCS streaming transfer [#3711] Aliyun Object Storage Service (OSS) FileIO is supported [#3553] Any S3-compatible storage (e.g. MinIO) can now be accessed through AWS S3FileIO with custom endpoint and credential configurations [#3656] [#3658] AWS S3FileIO now supports server-side checksum validation [#3813] AWS GlueCatalog now displays more table information including table location, description [#3467] and columns [#3888] Using multiple FileIOs based on file path scheme is supported by configuring a ResolvingFileIO [#3593] Spark Spark 3.2 is supported [#3335] with merge-on-read DELETE [#3970] RewriteDataFiles action now supports sort-based table optimization [#2829] and merge-on-read delete compaction [#3454]. The corresponding Spark call procedure rewrite_data_files is also supported [#3375] Time travel queries now use snapshot schema instead of the table’s latest schema [#3722] Spark vectorized reads now support row-level deletes [#3557] [#3287] add_files procedure now skips duplicated files by default (can be turned off with the check_duplicate_files flag) [#2895], skips folder without file [#2895] and partitions with null values [#2895] instead of throwing exception, and supports partition pruning for faster table import [#3745] Flink Flink 1.13 and 1.14 are supported [#3116] [#3434] Flink connector support is supported [#2666] Upsert write option is supported [#2863] Hive Table listing in Hive catalog can now skip non-Iceberg tables by disabling flag list-all-tables [#3908] Hive tables imported to Iceberg can now be read by IcebergInputFormat [#3312] File Formats ORC now supports writing delete file [#3248] [#3250] [#3366] Important bug fixes:\nCore Iceberg new data file root path is configured through write.data.path going forward. write.folder-storage.path and write.object-storage.path are deprecated [#3094] Catalog commit status is UNKNOWN instead of FAILURE when new metadata location cannot be found in snapshot history [#3717] Dropping table now also deletes old metadata files instead of leaving them strained [#3622] history and snapshots metadata tables can now query tables with no current snapshot instead of returning empty [#3812] Vendor Integrations Using cloud service integrations such as AWS GlueCatalog and S3FileIO no longer fail when missing Hadoop dependencies in the execution environment [#3590] AWS clients are now auto-closed when related FileIO or Catalog is closed. There is no need to close the AWS clients separately [#2878] Spark For Spark \u003e= 3.1, REFRESH TABLE can now be used with Spark session catalog instead of throwing exception [#3072] Insert overwrite mode now skips partition with 0 record instead of failing the write operation [#2895] Spark snapshot expiration action now supports custom FileIO instead of just HadoopFileIO [#3089] REPLACE TABLE AS SELECT can now work with tables with columns that have changed partition transform. Each old partition field of the same column is converted to a void transform with a different name [#3421] Spark SQL filters containing binary or fixed literals can now be pushed down instead of throwing exception [#3728] Flink A ValidationException will be thrown if a user configures both catalog-type and catalog-impl. Previously it chose to use catalog-type. The new behavior brings Flink consistent with Spark and Hive [#3308] Changelog tables can now be queried without RowData serialization issues [#3240] java.sql.Time data type can now be written without data overflow problem [#3740] Avro position delete files can now be read without encountering NullPointerException [#3540] Hive Hive catalog can now be initialized with a null Hadoop configuration instead of throwing exception [#3252] Table creation can now succeed instead of throwing exception when some columns do not have comments [#3531] File Formats Parquet file writing issue is fixed for string data with over 16 unparseable chars (e.g. high/low surrogates) [#3760] ORC vectorized read is now configured using read.orc.vectorization.batch-size instead of read.parquet.vectorization.batch-size [#3133] Other notable changes:\nThe community has finalized the long-term strategy of Spark, Flink and Hive support. See Multi-Engine Support page for more details. 0.12.1 Apache Iceberg 0.12.1 was released on November 8th, 2021.\nGit tag: 0.12.1 0.12.1 source tar.gz – signature – sha512 0.12.1 Spark 3.x runtime Jar 0.12.1 Spark 2.4 runtime Jar 0.12.1 Flink runtime Jar 0.12.1 Hive runtime Jar Important bug fixes and changes:\n#3264 fixes validation failures that occurred after snapshot expiration when writing Flink CDC streams to Iceberg tables. #3264 fixes reading projected map columns from Parquet files written before Parquet 1.11.1. #3195 allows validating that commits that produce row-level deltas don’t conflict with concurrently added files. Ensures users can maintain serializable isolation for update and delete operations, including merge operations. #3199 allows validating that commits that overwrite files don’t conflict with concurrently added files. Ensures users can maintain serializable isolation for overwrite operations. #3135 fixes equality-deletes using DATE, TIMESTAMP, and TIME types. #3078 prevents the JDBC catalog from overwriting the jdbc.user property if any property called user exists in the environment. #3035 fixes drop namespace calls with the DyanmoDB catalog. #3273 fixes importing Avro files via add_files by correctly setting the number of records. #3332 fixes importing ORC files with float or double columns in add_files. A more exhaustive list of changes is available under the 0.12.1 release milestone.\n0.12.0 Apache Iceberg 0.12.0 was released on August 15, 2021. It consists of 395 commits authored by 74 contributors over a 139 day period.\nGit tag: 0.12.0 0.12.0 source tar.gz – signature – sha512 0.12.0 Spark 3.x runtime Jar 0.12.0 Spark 2.4 runtime Jar 0.12.0 Flink runtime Jar 0.12.0 Hive runtime Jar High-level features:\nCore Allow Iceberg schemas to specify one or more columns as row identifiers [#2465]. Note that this is a prerequisite for supporting upserts in Flink. Added JDBC [#1870] and DynamoDB [#2688] catalog implementations. Added predicate pushdown for partitions and files metadata tables [#2358, #2926]. Added a new, more flexible compaction action for Spark that can support different strategies such as bin packing and sorting. [#2501, #2609]. Added the ability to upgrade to v2 or create a v2 table using the table property format-version=2 [#2887]. Added support for nulls in StructLike collections [#2929]. Added key_metadata field to manifest lists for encryption [#2675]. Flink Added support for SQL primary keys [#2410]. Hive Added the ability to set the catalog at the table level in the Hive Metastore. This makes it possible to write queries that reference tables from multiple catalogs [#2129]. As a result of [#2129], deprecated the configuration property iceberg.mr.catalog which was previously used to configure the Iceberg catalog in MapReduce and Hive [#2565]. Added table-level JVM lock on commits[#2547]. Added support for Hive’s vectorized ORC reader [#2613]. Spark Added SET and DROP IDENTIFIER FIELDS clauses to ALTER TABLE so people don’t have to look up the DDL [#2560]. Added support for ALTER TABLE REPLACE PARTITION FIELD DDL [#2365]. Added support for micro-batch streaming reads for structured streaming in Spark3 [#2660]. Improved the performance of importing a Hive table by not loading all partitions from Hive and instead pushing the partition filter to the Metastore [#2777]. Added support for UPDATE statements in Spark [#2193, #2206]. Added support for Spark 3.1 [#2512]. Added RemoveReachableFiles action [#2415]. Added add_files stored procedure [#2210]. Refactored Actions API and added a new entry point. Added support for Hadoop configuration overrides [#2922]. Added support for the TIMESTAMP WITHOUT TIMEZONE type in Spark [#2757]. Added validation that files referenced by row-level deletes are not concurrently rewritten [#2308]. Important bug fixes:\nCore Fixed string bucketing with non-BMP characters [#2849]. Fixed Parquet dictionary filtering with fixed-length byte arrays and decimals [#2551]. Fixed a problem with the configuration of HiveCatalog [#2550]. Fixed partition field IDs in table replacement [#2906]. Hive Enabled dropping HMS tables even if the metadata on disk gets corrupted [#2583]. Parquet Fixed Parquet row group filters when types are promoted from int to long or from float to double [#2232] Spark Fixed MERGE INTO in Spark when used with SinglePartition partitioning [#2584]. Fixed nested struct pruning in Spark [#2877]. Fixed NaN handling for float and double metrics [#2464]. Fixed Kryo serialization for data and delete files [#2343]. Other notable changes:\nThe Iceberg Community voted to approve version 2 of the Apache Iceberg Format Specification. The differences between version 1 and 2 of the specification are documented here. Bugfixes and stability improvements for NessieCatalog. Improvements and fixes for Iceberg’s Python library. Added a vectorized reader for Apache Arrow [#2286]. The following Iceberg dependencies were upgraded: Hive 2.3.8 [#2110]. Avro 1.10.1 [#1648]. Parquet 1.12.0 [#2441]. 0.11.1 Git tag: 0.11.1 0.11.1 source tar.gz – signature – sha512 0.11.1 Spark 3.0 runtime Jar 0.11.1 Spark 2.4 runtime Jar 0.11.1 Flink runtime Jar 0.11.1 Hive runtime Jar Important bug fixes:\n#2367 prohibits deleting data files when tables are dropped if GC is disabled. #2196 fixes data loss after compaction when large files are split into multiple parts and only some parts are combined with other files. #2232 fixes row group filters with promoted types in Parquet. #2267 avoids listing non-Iceberg tables in Glue. #2254 fixes predicate pushdown for Date in Hive. #2126 fixes writing of Date, Decimal, Time, UUID types in Hive. #2241 fixes vectorized ORC reads with metadata columns in Spark. #2154 refreshes the relation cache in DELETE and MERGE operations in Spark. 0.11.0 Git tag: 0.11.0 0.11.0 source tar.gz – signature – sha512 0.11.0 Spark 3.0 runtime Jar 0.11.0 Spark 2.4 runtime Jar 0.11.0 Flink runtime Jar 0.11.0 Hive runtime Jar High-level features:\nCore API now supports partition spec and sort order evolution Spark 3 now supports the following SQL extensions: MERGE INTO (experimental) DELETE FROM (experimental) ALTER TABLE … ADD/DROP PARTITION ALTER TABLE … WRITE ORDERED BY Invoke stored procedures using CALL Flink now supports streaming reads, CDC writes (experimental), and filter pushdown AWS module is added to support better integration with AWS, with AWS Glue catalog support and dedicated S3 FileIO implementation Nessie module is added to support integration with project Nessie Important bug fixes:\n#1981 fixes bug that date and timestamp transforms were producing incorrect values for dates and times before 1970. Before the fix, negative values were incorrectly transformed by date and timestamp transforms to 1 larger than the correct value. For example, day(1969-12-31 10:00:00) produced 0 instead of -1. The fix is backwards compatible, which means predicate projection can still work with the incorrectly transformed partitions written using older versions. #2091 fixes ClassCastException for type promotion int to long and float to double during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for int and float fields. #1998 fixes bug in HiveTableOperation that unlock is not called if new metadata cannot be deleted. Now it is guaranteed that unlock is always called for Hive catalog users. #1979 fixes table listing failure in Hadoop catalog when user does not have permission to some tables. Now the tables with no permission are ignored in listing. #1798 fixes scan task failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files for each scan task. #1785 fixes invalidation of metadata tables in CachingCatalog. When a table is dropped, all the metadata tables associated with it are also invalidated in the cache. #1960 fixes bug that ORC writer does not read metrics config and always use the default. Now customized metrics config is respected. Other notable changes:\nNaN counts are now supported in metadata Shared catalog properties are added in core library to standardize catalog level configurations Spark and Flink now support dynamically loading customized Catalog and FileIO implementations Spark 2 now supports loading tables from other catalogs, like Spark 3 Spark 3 now supports catalog names in DataFrameReader when using Iceberg as a format Flink now uses the number of Iceberg read splits as its job parallelism to improve performance and save resource. Hive (experimental) now supports INSERT INTO, case insensitive query, projection pushdown, create DDL with schema and auto type conversion ORC now supports reading tinyint, smallint, char, varchar types Avro to Iceberg schema conversion now preserves field docs 0.10.0 Git tag: 0.10.0 0.10.0 source tar.gz – signature – sha512 0.10.0 Spark 3.0 runtime Jar 0.10.0 Spark 2.4 runtime Jar 0.10.0 Flink runtime Jar 0.10.0 Hive runtime Jar High-level features:\nFormat v2 support for building row-level operations (MERGE INTO) in processing engines Note: format v2 is not yet finalized and does not have a forward-compatibility guarantee Flink integration for writing to Iceberg tables and reading from Iceberg tables (reading supports batch mode only) Hive integration for reading from Iceberg tables, with filter pushdown (experimental; configuration may change) Important bug fixes:\n#1706 fixes non-vectorized ORC reads in Spark that incorrectly skipped rows #1536 fixes ORC conversion of notIn and notEqual to match null values #1722 fixes Expressions.notNull returning an isNull predicate; API only, method was not used by processing engines #1736 fixes IllegalArgumentException in vectorized Spark reads with negative decimal values #1666 fixes file lengths returned by the ORC writer, using compressed size rather than uncompressed size #1674 removes catalog expiration in HiveCatalogs #1545 automatically refreshes tables in Spark when not caching table instances Other notable changes:\nThe iceberg-hive module has been renamed to iceberg-hive-metastore to avoid confusion Spark 3 is based on 3.0.1 that includes the fix for SPARK-32168 Hadoop tables will recover from version hint corruption Tables can be configured with a required sort order Data file locations can be customized with a dynamically loaded LocationProvider ORC file imports can apply a name mapping for stats A more exhaustive list of changes is available under the 0.10.0 release milestone.\n0.9.1 Git tag: 0.9.1 0.9.1 source tar.gz – signature – sha512 0.9.1 Spark 3.0 runtime Jar 0.9.1 Spark 2.4 runtime Jar 0.9.0 Git tag: 0.9.0 0.9.0 source tar.gz – signature – sha512 0.9.0 Spark 3.0 runtime Jar 0.9.0 Spark 2.4 runtime Jar 0.8.0 Git tag: apache-iceberg-0.8.0-incubating 0.8.0-incubating source tar.gz – signature – sha512 0.8.0-incubating Spark 2.4 runtime Jar 0.7.0 Git tag: apache-iceberg-0.7.0-incubating 0.7.0-incubating source tar.gz – signature – sha512 0.7.0-incubating Spark 2.4 runtime Jar ","description":"","title":"Releases","uri":"/releases/"},{"categories":null,"content":" AES GCM Stream file format extension Background and Motivation Iceberg supports a number of data file formats. Two of these formats (Parquet and ORC) have built-in encryption capabilities, that allow to protect sensitive information in the data files. However, besides the data files, Iceberg tables also have metadata files, that keep sensitive information too (e.g., min/max values in manifest files, or bloom filter bitsets in puffin files). Metadata file formats (AVRO, JSON, Puffin) don’t have encryption support.\nMoreover, with the exception of Parquet, no Iceberg data or metadata file format supports integrity verification, required for end-to-end tamper proofing of Iceberg tables.\nThis document specifies details of a simple file format extension that adds encryption and tamper-proofing to any existing file format.\nGoals Metadata encryption: enable encryption of manifests, manifest lists, snapshots and stats. Avro data encryption: enable encryption of data files in tables that use the Avro format. Support read splitting: enable seekable decrypting streams that can be used with splittable formats like Avro. Tamper proofing of Iceberg data and metadata files. Overview The output stream, produced by a metadata or data writer, is split into equal-size blocks (plus last block that can be shorter). Each block is enciphered (encrypted/signed) with a given encryption key, and stored in a file in the AES GCM Stream format. Upon reading, the stored cipherblocks are verified for integrity; then decrypted and passed to metadata or data readers.\nEncryption algorithm AES GCM Stream uses the standard AEG GCM cipher, and supports all AES key sizes: 128, 192 and 256 bits.\nAES GCM is an authenticated encryption. Besides data confidentiality (encryption), it supports two levels of integrity verification (authentication): of the data (default), and of the data combined with an optional AAD (“additional authenticated data”). An AAD is a free text to be authenticated, together with the data. The structure of AES GCM Stream AADs is described below.\nAES GCM requires a unique vector to be provided for each encrypted block. In this document, the unique input to GCM encryption is called nonce (“number used once”). AES GCM Stream encryption uses the RBG-based (random bit generator) nonce construction as defined in the section 8.2.2 of the NIST SP 800-38D document. For each encrypted block, AES GCM Stream generates a unique nonce with a length of 12 bytes (96 bits).\nFormat specification File structure The AES GCM Stream files have the following structure\nMagic BlockLength CipherBlock₁ CipherBlock₂ ... CipherBlockₙ where\nMagic is four bytes 0x41, 0x47, 0x53, 0x31 (“AGS1”, short for: AES GCM Stream, version 1) BlockLength is four bytes (little endian) integer keeping the length of the equal-size split blocks before encryption. The length is specified in bytes. CipherBlockᵢ is the i-th enciphered block in the file, with the structure defined below. Cipher Block structure Cipher blocks have the following structure\nnonce ciphertext tag where\nnonce is the AES GCM nonce, with a length of 12 bytes. ciphertext is the encrypted block. Its length is identical to the length of the block before encryption (“plaintext”). The length of all plaintext blocks, except the last, is BlockLength bytes. The last block has a non-zero length \u003c= BlockLength. tag is the AES GCM tag, with a length of 16 bytes. AES GCM Stream encrypts all blocks by the GCM cipher, without padding. The AES GCM cipher must be implemented by a cryptographic provider according to the NIST SP 800-38D specification. In AES GCM Stream, an input to the GCM cipher is an AES encryption key, a nonce, a plaintext and an AAD (described below). The output is a ciphertext with the length equal to that of plaintext, and a 16-byte authentication tag used to verify the ciphertext and AAD integrity.\nAdditional Authenticated Data The AES GCM cipher protects against byte replacement inside a ciphertext block - but, without an AAD, it can’t prevent replacement of one ciphertext block with another (encrypted with the same key). AES GCM Stream leverages AADs to protect against swapping ciphertext blocks inside a file or between files. AES GCM Stream can also protect against swapping full files - for example, replacement of a metadata file with an old version. AADs are built to reflects the identity of a file and of the blocks inside the file.\nAES GCM Stream constructs a block AAD from two components: an AAD prefix - a string provided by Iceberg for the file (with the file ID), and an AAD suffix - the block sequence number in the file, as an int in a 4-byte little-endian form. The block AAD is a direct concatenation of the prefix and suffix parts.\n","description":"","title":"AES GCM Stream Spec","uri":"/gcm-stream-spec/"},{"categories":null,"content":" Available Benchmarks and how to run them Benchmarks are located under \u003cproject-name\u003e/jmh. It is generally favorable to only run the tests of interest rather than running all available benchmarks. Also note that JMH benchmarks run within the same JVM as the system-under-test, so results might vary between runs.\nRunning Benchmarks on GitHub It is possible to run one or more Benchmarks via the JMH Benchmarks GH action on your own fork of the Iceberg repo. This GH action takes the following inputs:\nThe repository name where those benchmarks should be run against, such as apache/iceberg or \u003cuser\u003e/iceberg The branch name to run benchmarks against, such as master or my-cool-feature-branch A list of comma-separated double-quoted Benchmark names, such as \"IcebergSourceFlatParquetDataReadBenchmark\", \"IcebergSourceFlatParquetDataFilterBenchmark\", \"IcebergSourceNestedListParquetDataWriteBenchmark\" Benchmark results will be uploaded once all benchmarks are done.\nIt is worth noting that the GH runners have limited resources so the benchmark results should rather be seen as an indicator to guide developers in understanding code changes. It is likely that there is variability in results across different runs, therefore the benchmark results shouldn’t be used to form assumptions around production choices.\nRunning Benchmarks locally Below are the existing benchmarks shown with the actual commands on how to run them locally.\nIcebergSourceNestedListParquetDataWriteBenchmark A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt\nSparkParquetReadersNestedDataBenchmark A benchmark that evaluates the performance of reading nested Parquet data using Iceberg and Spark Parquet readers. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt\nSparkParquetWritersFlatDataBenchmark A benchmark that evaluates the performance of writing Parquet data with a flat schema using Iceberg and Spark Parquet writers. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt\nIcebergSourceFlatORCDataReadBenchmark A benchmark that evaluates the performance of reading ORC data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt\nSparkParquetReadersFlatDataBenchmark A benchmark that evaluates the performance of reading Parquet data with a flat schema using Iceberg and Spark Parquet readers. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt\nVectorizedReadDictionaryEncodedFlatParquetDataBenchmark A benchmark to compare performance of reading Parquet dictionary encoded data with a flat schema using vectorized Iceberg read path and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt\nIcebergSourceNestedListORCDataWriteBenchmark A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt\nVectorizedReadFlatParquetDataBenchmark A benchmark to compare performance of reading Parquet data with a flat schema using vectorized Iceberg read path and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt\nIcebergSourceFlatParquetDataWriteBenchmark A benchmark that evaluates the performance of writing Parquet data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt\nIcebergSourceNestedAvroDataReadBenchmark A benchmark that evaluates the performance of reading Avro data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt\nIcebergSourceFlatAvroDataReadBenchmark A benchmark that evaluates the performance of reading Avro data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt\nIcebergSourceNestedParquetDataWriteBenchmark A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt\nIcebergSourceNestedParquetDataReadBenchmark A benchmark that evaluates the performance of reading nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3: ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt\nIcebergSourceNestedORCDataReadBenchmark A benchmark that evaluates the performance of reading ORC data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt\nIcebergSourceFlatParquetDataReadBenchmark A benchmark that evaluates the performance of reading Parquet data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt\nIcebergSourceFlatParquetDataFilterBenchmark A benchmark that evaluates the file skipping capabilities in the Spark data source for Iceberg. This class uses a dataset with a flat schema, where the records are clustered according to the column used in the filter predicate. The performance is compared to the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:\n./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt\nIcebergSourceNestedParquetDataFilterBenchmark A benchmark that evaluates the file skipping capabilities in the Spark data source for Iceberg. This class uses a dataset with nested data, where the records are clustered according to the column used in the filter predicate. The performance is compared to the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3: ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt\nSparkParquetWritersNestedDataBenchmark A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and Spark Parquet writers. To run this benchmark for either spark-2 or spark-3: ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt ","description":"","title":"Benchmarks","uri":"/benchmarks/"},{"categories":null,"content":" Iceberg Blogs Here is a list of company blogs that talk about Iceberg. The blogs are ordered from most recent to oldest.\nApache Hive-4.x with Iceberg Branches \u0026 Tags Date: October 12th, 2023, Company: Cloudera\nAuthors: Ayush Saxena\nApache Hive 4.x With Apache Iceberg Date: October 12th, 2023, Company: Cloudera\nAuthors: Ayush Saxena\nFrom Hive Tables to Iceberg Tables: Hassle-Free Date: July 14th, 2023, Company: Cloudera\nAuthors: Srinivas Rishindra Pothireddi\n12 Times Faster Query Planning With Iceberg Manifest Caching in Impala Date: July 13th, 2023, Company: Cloudera\nAuthors: Riza Suminto\nHow Bilibili Builds OLAP Data Lakehouse with Apache Iceberg Date: June 14th, 2023, Company: Bilibili\nAuthors: Rui Li\nIntroducing the Apache Iceberg Catalog Migration Tool Date: May 12th, 2022, Company: Dremio\nAuthors: Dipankar Mazumdar \u0026 Ajantha Bhat\n3 Ways to Use Python with Apache Iceberg Date: April 12th, 2022, Company: Dremio\nAuthor: Alex Merced\n3 Ways to Convert a Delta Lake Table Into an Apache Iceberg Table Date: April 3rd, 2022, Company: Dremio\nAuthor: Alex Merced\nHow to Convert CSV Files into an Apache Iceberg table with Dremio Date: April 3rd, 2022, Company: Dremio\nAuthor: Alex Merced\nOpen Data Lakehouse powered by Iceberg for all your Data Warehouse needs Date: April 3rd, 2023, Company: Cloudera\nAuthors: Zoltan Borok-Nagy, Ayush Saxena, Tamas Mate, Simhadri Govindappa\nExploring Branch \u0026 Tags in Apache Iceberg using Spark Date: March 29th, 2022, Company: Dremio\nAuthor: Dipankar Mazumdar\nIceberg Tables: Catalog Support Now Available Date: March 29th, 2023, Company: Snowflake\nAuthors: Ron Ortloff, Dennis Huo\nDealing with Data Incidents Using the Rollback Feature in Apache Iceberg Date: February 24th, 2022, Company: Dremio\nAuthor: Dipankar Mazumdar\nPartition and File Pruning for Dremio’s Apache Iceberg-backed Reflections Date: February 8th, 2022, Company: Dremio\nAuthor: Benny Chow\nUnderstanding Iceberg Table Metadata Date: January 30st, 2023, Company: Snowflake\nAuthor: Phani Raj\nCreating and managing Apache Iceberg tables using serverless features and without coding Date: January 27th, 2023, Company: Snowflake\nAuthor: Parag Jain\nGetting started with Apache Iceberg Date: January 27th, 2023, Company: Snowflake\nAuthor: Jedidiah Rajbhushan\nHow Apache Iceberg enables ACID compliance for data lakes Date: January 13th, 2023, Company: Snowflake\nAuthors: Sumeet Tandure\nMulti-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform Date: December 15th, 2022, Company: Cloudera\nAuthors: Bill Zhang, Shaun Ahmadian, Zoltan Borok-Nagy, Vincent Kulandaisamy\nConnecting Tableau to Apache Iceberg Tables with Dremio Date: December 15th, 2022, Company: Dremio\nAuthor: Alex Merced\nGetting Started with Project Nessie, Apache Iceberg, and Apache Spark Using Docker Date: December 15th, 2022, Company: Dremio\nAuthor: Alex Merced\nApache Iceberg FAQ Date: December 14th, 2022, Company: Dremio\nAuthor: Alex Merced\nA Notebook for getting started with Project Nessie, Apache Iceberg, and Apache Spark Date: December 5th, 2022, Company: Dremio\nAuthor: Dipankar Mazumdar\nTime Travel with Dremio and Apache Iceberg Date: November 29th, 2022, Company: Dremio\nAuthor: Michael Flower\nCompaction in Apache Iceberg: Fine-Tuning Your Iceberg Table’s Data Files Date: November 9th, 2022, Company: Dremio\nAuthor: Alex Merced\nThe Life of a Read Query for Apache Iceberg Tables Date: October 31st, 2022, Company: Dremio\nAuthor: Alex Merced\nPuffins and Icebergs: Additional Stats for Apache Iceberg Tables Date: October 17th, 2022, Company: Dremio\nAuthor: Dipankar Mazumdar\nApache Iceberg and the Right to be Forgotten Date: September 30th, 2022, Company: Dremio\nAuthor: Alex Merced\nStreaming Data into Apache Iceberg tables using AWS Kinesis and AWS Glue Date: September 26th, 2022, Company: Dremio\nAuthor: Alex Merced\nIceberg Flink Sink: Stream Directly into your Data Warehouse Tables Date: October 12, 2022, Company: Tabular\nAuthor: Sam Redai\nPartitioning for Correctness (and Performance) Date: September 28, 2022, Company: Tabular\nAuthor: Jason Reid\nEnsuring High Performance at Any Scale with Apache Iceberg’s Object Store File Layout Date: September 20, 2022, Company: Dremio\nAuthor: Alex Merced\nIntroduction to Apache Iceberg Using Spark Date: September 15, 2022, Company: Dremio\nAuthor: Alex Merced\nHow Z-Ordering in Apache Iceberg Helps Improve Performance Date: September 13th, 2022, Company: Dremio\nAuthor: Dipankar Mazumdar\nApache Iceberg 101 – Your Guide to Learning Apache Iceberg Concepts and Practices Date: September 12th, 2022, Company: Dremio\nAuthor: Alex Merced\nA Hands-On Look at the Structure of an Apache Iceberg Table Date: August 24, 2022, Company: Dremio\nAuthor: Dipankar Mazumdar\nFuture-Proof Partitioning and Fewer Table Rewrites with Apache Iceberg Date: August 18, 2022, Company: Dremio\nAuthor: Alex Merced\nHow to use Apache Iceberg in CDP’s Open Lakehouse Date: August 8th, 2022, Company: Cloudera\nAuthors: Bill Zhang, Peter Ableda, Shaun Ahmadian, Manish Maheshwari\nNear Real-Time Ingestion For Trino Date: August 4th, 2022, Company: Starburst\nAuthors: Eric Hwang, Monica Miller, Brian Zhan\nHow to implement Apache Iceberg in AWS Athena Date: July 28th, 2022\nAuthor: [Shneior Dicastro]\nSupercharge your Data Lakehouse with Apache Iceberg in Cloudera Data Platform Date: June 30th, 2022, Company: Cloudera\nAuthors: Bill Zhang, Shaun Ahmadian\nMigrating a Hive Table to an Iceberg Table Hands-on Tutorial Date: June 6th, 2022, Company: Dremio\nAuthor: Alex Merced\nFewer Accidental Full Table Scans Brought to You by Apache Iceberg’s Hidden Partitioning Date: May 21st, 2022, Company: Dremio\nAuthor: Alex Merced\nAn Introduction To The Iceberg Java API Part 2 - Table Scans Date: May 11th, 2022, Company: Tabular\nAuthor: Sam Redai\nIceberg’s Guiding Light: The Iceberg Open Table Format Specification Date: April 26th, 2022, Company: Tabular\nAuthor: Sam Redai\nHow to Migrate a Hive Table to an Iceberg Table Date: April 15th, 2022, Company: Dremio\nAuthor: Alex Merced\nUsing Iceberg’s S3FileIO Implementation To Store Your Data In MinIO Date: April 14th, 2022, Company: Tabular\nAuthor: Sam Redai\nMaintaining Iceberg Tables – Compaction, Expiring Snapshots, and More Date: April 7th, 2022, Company: Dremio\nAuthor: Alex Merced\nAn Introduction To The Iceberg Java API - Part 1 Date: April 1st, 2022, Company: Tabular\nAuthor: Sam Redai\nIntegrated Audits: Streamlined Data Observability With Apache Iceberg Date: March 2nd, 2022, Company: Tabular\nAuthor: Sam Redai\nIntroducing Apache Iceberg in Cloudera Data Platform Date: February 23rd, 2022, Company: Cloudera\nAuthors: Bill Zhang, Peter Vary, Marton Bod, Wing Yew Poon\nWhat’s new in Iceberg 0.13 Date: February 22nd, 2022, Company: Tabular\nAuthor: Ryan Blue\nApache Iceberg Becomes Industry Open Standard with Ecosystem Adoption Date: February 3rd, 2022, Company: Dremio\nAuthor: Mark Lyons\nDocker, Spark, and Iceberg: The Fastest Way to Try Iceberg! Date: February 2nd, 2022, Company: Tabular\nAuthor: Sam Redai, Kyle Bendickson\nExpanding the Data Cloud with Apache Iceberg Date: January 21st, 2022, Company: Snowflake\nAuthor: James Malone\nIceberg FileIO: Cloud Native Tables Date: December 16th, 2021, Company: Tabular\nAuthor: Daniel Weeks\nUsing Spark in EMR with Apache Iceberg Date: December 10th, 2021, Company: Tabular\nAuthor: Sam Redai\nUsing Flink CDC to synchronize data from MySQL sharding tables and build real-time data lake Date: November 11th, 2021, Company: Ververica, Alibaba Cloud\nAuthor: Yuxia Luo, Jark Wu, Zheng Hu\nMetadata Indexing in Iceberg Date: October 10th, 2021, Company: Tabular\nAuthor: Ryan Blue\nUsing Debezium to Create a Data Lake with Apache Iceberg Date: October 20th, 2021, Company: Memiiso Community\nAuthor: Ismail Simsek\nHow to Analyze CDC Data in Iceberg Data Lake Using Flink Date: June 15th, 2021, Company: Alibaba Cloud Community\nAuthor: Li Jinsong, Hu Zheng, Yang Weihai, Peidan Li\nApache Iceberg: An Architectural Look Under the Covers Date: July 6th, 2021, Company: Dremio\nAuthor: Jason Hughes\nMigrating to Apache Iceberg at Adobe Experience Platform Date: Jun 17th, 2021, Company: Adobe\nAuthor: Romin Parekh, Miao Wang, Shone Sadler\nFlink + Iceberg: How to Construct a Whole-scenario Real-time Data Warehouse Date: Jun 8th, 2021, Company: Tencent\nAuthor Shu (Simon Su) Su\nTrino on Ice III: Iceberg Concurrency Model, Snapshots, and the Iceberg Spec Date: May 25th, 2021, Company: Starburst\nAuthor: Brian Olsen\nTrino on Ice II: In-Place Table Evolution and Cloud Compatibility with Iceberg Date: May 11th, 2021, Company: Starburst\nAuthor: Brian Olsen\nTrino On Ice I: A Gentle Introduction To Iceberg Date: Apr 27th, 2021, Company: Starburst\nAuthor: Brian Olsen\nApache Iceberg: A Different Table Design for Big Data Date: Feb 1st, 2021, Company: thenewstack.io\nAuthor: Susan Hall\nA Short Introduction to Apache Iceberg Date: Jan 26th, 2021, Company: Expedia\nAuthor: Christine Mathiesen\nTaking Query Optimizations to the Next Level with Iceberg Date: Jan 14th, 2021, Company: Adobe\nAuthor: Gautam Kowshik, Xabriel J. Collazo Mojica\nFastIngest: Low-latency Gobblin with Apache Iceberg and ORC format Date: Jan 6th, 2021, Company: Linkedin\nAuthor: Zihan Li, Sudarshan Vasudevan, Lei Sun, Shirshanka Das\nHigh Throughput Ingestion with Iceberg Date: Dec 22nd, 2020, Company: Adobe\nAuthor: Andrei Ionescu, Shone Sadler, Anil Malkani\nOptimizing data warehouse storage Date: Dec 21st, 2020, Company: Netflix\nAuthor: Anupom Syam\nIceberg at Adobe Date: Dec 3rd, 2020, Company: Adobe\nAuthor: Shone Sadler, Romin Parekh, Anil Malkani\nBulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores Date: Oct 27th, 2020, Company: Netflix\nAuthor: Tianlong Chen, Ioannis Papapanagiotou\n","description":"","title":"Blogs","uri":"/blogs/"},{"categories":null,"content":" Welcome! Apache Iceberg tracks issues in GitHub and prefers to receive contributions as pull requests.\nCommunity discussions happen primarily on the dev mailing list, on apache-iceberg Slack workspace, and on specific GitHub issues.\nContribute See Contributing for more details on how to contribute to Iceberg.\nIssues Issues are tracked in GitHub:\nView open issues Open a new issue Slack We use the Apache Iceberg workspace on Slack. To be invited, follow this invite link.\nPlease note that this link may occasionally break when Slack does an upgrade. If you encounter problems using it, please let us know by sending an email to dev@iceberg.apache.org.\nIceberg Community Events This calendar contains two calendar feeds:\nIceberg Community Events - Events such as conferences and meetups, aimed to educate and inspire Iceberg users. Iceberg Dev Events - Events such as the triweekly Iceberg sync, aimed to discuss the project roadmap and how to implement features. You can subscribe to either or both of these calendars by clicking the “+ Google Calendar” icon on the bottom right.\nMailing Lists Iceberg has four mailing lists:\nDevelopers: dev@iceberg.apache.org – used for community discussions Subscribe Unsubscribe Archive Commits: commits@iceberg.apache.org – distributes commit notifications Subscribe Unsubscribe Archive Issues: issues@iceberg.apache.org – Github issue tracking Subscribe Unsubscribe Archive Private: private@iceberg.apache.org – private list for the PMC to discuss sensitive issues related to the health of the project Archive Community Guidelines Apache Iceberg Community Guidelines The Apache Iceberg community is built on the principles described in the Apache Way and all who engage with the community are expected to be respectful, open, come with the best interests of the community in mind, and abide by the Apache Foundation Code of Conduct.\nParticipants with Corporate Interests A wide range of corporate entities have interests that overlap in both features and frameworks related to Iceberg and while we encourage engagement and contributions, the community is not a venue for marketing, solicitation, or recruitment.\nAny vendor who wants to participate in the Apache Iceberg community Slack workspace should create a dedicated vendor channel for their organization prefixed by vendor-.\nThis space can be used to discuss features and integration with Iceberg related to the vendor offering. This space should not be used to promote competing vendor products/services or disparage other vendor offerings. Discussion should be focused on questions asked by the community and not to expand/introduce/redirect users to alternate offerings.\nMarketing / Solicitation / Recruiting The Apache Iceberg community is a space for everyone to operate free of influence. The development lists, slack workspace, and github should not be used to market products or services. Solicitation or overt promotion should not be performed in common channels or through direct messages.\nRecruitment of community members should not be conducted through direct messages or community channels, but opportunities related to contributing to or using Iceberg can be posted to the #jobs channel.\nFor questions regarding any of the guidelines above, please contact a PMC member\n","description":"","title":"Community","uri":"/community/"},{"categories":null,"content":" Contributing In this page, you will find some guidelines on contributing to Apache Iceberg. Please keep in mind that none of these are hard rules and they’re meant as a collection of helpful suggestions to make contributing as seamless of an experience as possible.\nIf you are thinking of contributing but first would like to discuss the change you wish to make, we welcome you to head over to the Community page on the official Iceberg documentation site to find a number of ways to connect with the community, including slack and our mailing lists. Of course, always feel free to just open a new issue in the GitHub repo. You can also check the following for a good first issue.\nThe Iceberg Project is hosted on GitHub at https://github.com/apache/iceberg.\nPull Request Process The Iceberg community prefers to receive contributions as Github pull requests.\nView open pull requests\nPRs are automatically labeled based on the content by our github-actions labeling action It’s helpful to include a prefix in the summary that provides context to PR reviewers, such as Build:, Docs:, Spark:, Flink:, Core:, API: If a PR is related to an issue, adding Closes #1234 in the PR description will automatically close the issue and helps keep the project clean If a PR is posted for visibility and isn’t necessarily ready for review or merging, be sure to convert the PR to a draft Building the Project Locally Iceberg is built using Gradle with Java 8 or Java 11.\nTo invoke a build and run tests: ./gradlew build To skip tests: ./gradlew build -x test -x integrationTest To fix code style: ./gradlew spotlessApply To build particular Spark/Flink Versions: ./gradlew build -DsparkVersions=3.2,3.3 -DflinkVersions=1.14 Iceberg table support is organized in library modules:\niceberg-common contains utility classes used in other modules iceberg-api contains the public Iceberg API iceberg-core contains implementations of the Iceberg API and support for Avro data files, this is what processing engines should depend on iceberg-parquet is an optional module for working with tables backed by Parquet files iceberg-arrow is an optional module for reading Parquet into Arrow memory iceberg-orc is an optional module for working with tables backed by ORC files iceberg-hive-metastore is an implementation of Iceberg tables backed by the Hive metastore Thrift client iceberg-data is an optional module for working with tables directly from JVM applications This project Iceberg also has modules for adding Iceberg support to processing engines:\niceberg-spark is an implementation of Spark’s Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version) iceberg-flink contains classes for integrating with Apache Flink (use iceberg-flink-runtime for a shaded version) iceberg-mr contains an InputFormat and other classes for integrating with Apache Hive iceberg-pig is an implementation of Pig’s LoadFunc API for Iceberg Setting up IDE and Code Style Configuring Code Formatter for Eclipse/IntelliJ Follow the instructions for Eclipse or IntelliJ to install the google-java-format plugin (note the required manual actions for IntelliJ).\nSemantic Versioning Apache Iceberg leverages semantic versioning to ensure compatibility for developers and users of the iceberg libraries as APIs and implementations evolve. The requirements and guarantees provided depend on the subproject as described below:\nMajor Version Deprecations Required Modules iceberg-api\nThe API subproject is the main interface for developers and users of the Iceberg API and therefore has the strongest guarantees. Evolution of the interfaces in this subproject are enforced by Revapi and require explicit acknowledgement of API changes. All public interfaces and classes require one major version for deprecation cycle. Any backward incompatible changes should be annotated as @Deprecated and removed for the next major release. Backward compatible changes are allowed within major versions.\nMinor Version Deprecations Required Modules iceberg-common iceberg-core iceberg-data iceberg-orc iceberg-parquet\nChanges to public interfaces and classes in the subprojects listed above require a deprecation cycle of one minor release. These projects contain common and internal code used by other projects and can evolve within a major release. Minor release deprecation will provide other subprojects and external projects notice and opportunity to transition to new implementations.\nMinor Version Deprecations Discretionary modules (All modules not referenced above)\nOther modules are less likely to be extended directly and modifications should make a good faith effort to follow a minor version deprecation cycle. If there are significant structural or design changes that result in deprecations being difficult to orchestrate, it is up to the committers to decide if deprecation is necessary.\nDeprecation Notices All interfaces, classes, and methods targeted for deprecation must include the following:\n@Deprecated annotation on the appropriate element @depreceted javadoc comment including: the version for removal, the appropriate alternative for usage Replacement of existing code paths that use the deprecated behavior Example:\n/** * Set the sequence number for this manifest entry. * * @param sequenceNumber a sequence number * @deprecated since 1.0.0, will be removed in 1.1.0; use dataSequenceNumber() instead. */ @Deprecated void sequenceNumber(long sequenceNumber); Iceberg Code Contribution Guidelines Style For Python, please use the tox command tox -e format to apply autoformatting to the project.\nJava code adheres to the Google style, which will be verified via ./gradlew spotlessCheck during builds. In order to automatically fix Java code style issues, please use ./gradlew spotlessApply.\nNOTE: The google-java-format plugin will always use the latest version of the google-java-format. However, spotless itself is configured to use google-java-format 1.7 since that version is compatible with JDK 8. When formatting the code in the IDE, there is a slight chance that it will produce slightly different results. In such a case please run ./gradlew spotlessApply as CI will check the style against google-java-format 1.7.\nCopyright Each file must include the Apache license information as a header.\nLicensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Configuring Copyright for IntelliJ IDEA Every file needs to include the Apache license as a header. This can be automated in IntelliJ by adding a Copyright profile:\nIn the Settings/Preferences dialog go to Editor → Copyright → Copyright Profiles.\nAdd a new profile and name it Apache.\nAdd the following text as the license text:\nLicensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Go to Editor → Copyright and choose the Apache profile as the default profile for this project.\nClick Apply.\nJava style guidelines Method naming Make method names as short as possible, while being clear. Omit needless words. Avoid get in method names, unless an object must be a Java bean. In most cases, replace get with a more specific verb that describes what is happening in the method, like find or fetch. If there isn’t a more specific verb or the method is a getter, omit get because it isn’t helpful to readers and makes method names longer. Where possible, use words and conjugations that form correct sentences in English when read For example, Transform.preservesOrder() reads correctly in an if statement: if (transform.preservesOrder()) { ... } Boolean arguments Avoid boolean arguments to methods that are not private to avoid confusing invocations like sendMessage(false). It is better to create two methods with names and behavior, even if both are implemented by one internal method.\n// prefer exposing suppressFailure in method names public void sendMessageIgnoreFailure() { sendMessageInternal(true); } public void sendMessage() { sendMessageInternal(false); } private void sendMessageInternal(boolean suppressFailure) { ... } When passing boolean arguments to existing or external methods, use inline comments to help the reader understand actions without an IDE.\n// BAD: it is not clear what false controls dropTable(identifier, false); // GOOD: these uses of dropTable are clear to the reader dropTable(identifier, true /* purge data */); dropTable(identifier, purge); Config naming Use - to link words in one concept For example, preferred convection access-key-id rather than access.key.id Use . to create a hierarchy of config groups For example, s3 in s3.access-key-id, s3.secret-access-key Testing AssertJ Prefer using AssertJ assertions as those provide a rich and intuitive set of strongly-typed assertions. Checks can be expressed in a fluent way and AssertJ provides rich context when assertions fail. Additionally, AssertJ has powerful testing capabilities on collections and exceptions. Please refer to the usage guide for additional examples.\n// bad: will only say true != false when check fails assertTrue(x instanceof Xyz); // better: will show type of x when check fails assertThat(x).isInstanceOf(Xyz.class); // bad: will only say true != false when check fails assertTrue(catalog.listNamespaces().containsAll(expected)); // better: will show content of expected and of catalog.listNamespaces() if check fails assertThat(catalog.listNamespaces()).containsAll(expected); // ok assertNotNull(metadataFileLocations); assertEquals(metadataFileLocations.size(), 4); // better: will show the content of metadataFileLocations if check fails assertThat(metadataFileLocations).isNotNull().hasSize(4); // or assertThat(metadataFileLocations).isNotNull().hasSameSizeAs(expected).hasSize(4); // bad try { catalog.createNamespace(deniedNamespace); Assert.fail(\"this should fail\"); } catch (Exception e) { assertEquals(AccessDeniedException.class, e.getClass()); assertEquals(\"User 'testUser' has no permission to create namespace\", e.getMessage()); } // better assertThatThrownBy(() -\u003e catalog.createNamespace(deniedNamespace)) .isInstanceOf(AccessDeniedException.class) .hasMessage(\"User 'testUser' has no permission to create namespace\"); Checks on exceptions should always make sure to assert that a particular exception message has occurred.\nAwaitility Avoid using Thread.sleep() in tests as it leads to long test durations and flaky behavior if a condition takes slightly longer than expected.\ndeleteTablesAsync(); Thread.sleep(3000L); assertThat(tables()).isEmpty(); A better alternative is using Awaitility to make sure tables() are eventually empty. The below example will run the check with a default polling interval of 100 millis:\ndeleteTablesAsync(); Awaitility.await(\"Tables were not deleted\") .atMost(5, TimeUnit.SECONDS) .untilAsserted(() -\u003e assertThat(tables()).isEmpty()); Please refer to the usage guide of Awaitility for more usage examples.\nJUnit4 / JUnit5 Iceberg currently uses a mix of JUnit4 (org.junit imports) and JUnit5 (org.junit.jupiter.api imports) tests. To allow an easier migration to JUnit5 in the future, new test classes that are being added to the codebase should be written purely in JUnit5 where possible.\nRunning Benchmarks Some PRs/changesets might require running benchmarks to determine whether they are affecting the baseline performance. Currently there is no “push a single button to get a performance comparison” solution available, therefore one has to run JMH performance tests on their local machine and post the results on the PR.\nSee Benchmarks for a summary of available benchmarks and how to run them.\nWebsite and Documentation Updates Currently, there is an iceberg-docs repository which contains the HTML/CSS and other files needed for the Iceberg website. The docs folder in the Iceberg repository contains the markdown content for the documentation site. All markdown changes should still be made to this repository.\nSubmitting Pull Requests Changes to the markdown contents should be submitted directly to this repository.\nChanges to the website appearance (e.g. HTML, CSS changes) should be submitted to the iceberg-docs repository against the main branch.\nChanges to the documentation of old Iceberg versions should be submitted to the iceberg-docs repository against the specific version branch.\nReporting Issues All issues related to the doc website should still be submitted to the Iceberg repository. The GitHub Issues feature of the iceberg-docs repository is disabled.\nRunning Locally Clone the iceberg-docs repository to run the website locally:\ngit clone git@github.com:apache/iceberg-docs.git cd iceberg-docs To start the landing page site locally, run:\ncd landing-page \u0026\u0026 hugo serve To start the documentation site locally, run:\ncd docs \u0026\u0026 hugo serve If you would like to see how the latest website looks based on the documentation in the Iceberg repository, you can copy docs to the iceberg-docs repository by:\nrm -rf docs/content/docs rm -rf landing-page/content/common cp -r \u003cpath to iceberg repo\u003e/docs/versioned docs/content/docs cp -r \u003cpath to iceberg repo\u003e/docs/common landing-page/content/common ","description":"","title":"Contribute","uri":"/contribute/"},{"categories":null,"content":" Introduction This page walks you through the release process of the Iceberg project. Here you can read about the release process in general for an Apache project.\nDecisions about releases are made by three groups:\nRelease Manager: Does the work of creating the release, signing it, counting votes, announcing the release and so on. Requires the assistance of a committer for some steps. The community: Performs the discussion of whether it is the right time to create a release and what that release should contain. The community can also cast non-binding votes on the release. PMC: Gives binding votes on the release. This page describes the procedures that the release manager and voting PMC members take during the release process.\nSetup To create a release candidate, you will need:\nApache LDAP credentals for Nexus and SVN A GPG key for signing, published in KEYS If you have not published your GPG key yet, you must publish it before sending the vote email by doing:\nsvn co https://dist.apache.org/repos/dist/dev/iceberg icebergsvn cd icebergsvn echo \"\" \u003e\u003e KEYS # append a newline gpg --list-sigs \u003cYOUR KEY ID HERE\u003e \u003e\u003e KEYS # append signatures gpg --armor --export \u003cYOUR KEY ID HERE\u003e \u003e\u003e KEYS # append public key block svn commit -m \"add key for \u003cYOUR NAME HERE\u003e\" Nexus access Nexus credentials are configured in your personal ~/.gradle/gradle.properties file using mavenUser and mavenPassword:\nmavenUser=yourApacheID mavenPassword=SomePassword PGP signing The release scripts use the command-line gpg utility so that signing can use the gpg-agent and does not require writing your private key’s passphrase to a configuration file.\nTo configure gradle to sign convenience binary artifacts, add the following settings to ~/.gradle/gradle.properties:\nsigning.gnupg.keyName=Your Name (CODE SIGNING KEY) To use gpg instead of gpg2, also set signing.gnupg.executable=gpg\nFor more information, see the Gradle signing documentation.\nApache repository The release should be executed against https://github.com/apache/iceberg.git instead of any fork. Set it as remote with name apache for release if it is not already set up.\nCreating a release candidate Initiate a discussion about the release with the community This step can be useful to gather ongoing patches that the community thinks should be in the upcoming release.\nThe communication can be started via a [DISCUSS] mail on the dev@ channel and the desired tickets can be added to the github milestone of the next release.\nNote, creating a milestone in github requires a committer. However, a non-committer can assign tasks to a milestone if added to the list of collaborators in .asf.yaml\nThe release status is discussed during each community sync meeting. Release manager should join the meeting to report status and discuss any release blocker.\nBuild the source release To create the source release artifacts, run the source-release.sh script with the release version and release candidate number:\ndev/source-release.sh -v 0.13.0 -r 0 -k \u003cYOUR KEY ID HERE\u003e Example console output:\nPreparing source for apache-iceberg-0.13.0-rc1 Adding version.txt and tagging release... [master ca8bb7d0] Add version.txt for release 0.13.0 1 file changed, 1 insertion(+) create mode 100644 version.txt Pushing apache-iceberg-0.13.0-rc1 to origin... Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Delta compression using up to 12 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (4/4), 433 bytes | 433.00 KiB/s, done. Total 4 (delta 1), reused 0 (delta 0) remote: Resolving deltas: 100% (1/1), completed with 1 local object. To https://github.com/apache/iceberg.git * [new tag] apache-iceberg-0.13.0-rc1 -\u003e apache-iceberg-0.13.0-rc1 Creating tarball using commit ca8bb7d0821f35bbcfa79a39841be8fb630ac3e5 Signing the tarball... Checking out Iceberg RC subversion repo... Checked out revision 52260. Adding tarball to the Iceberg distribution Subversion repo... A tmp/apache-iceberg-0.13.0-rc1 A tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz.asc A (bin) tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz A tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz.sha512 Adding tmp/apache-iceberg-0.13.0-rc1 Adding (bin) tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz Adding tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz.asc Adding tmp/apache-iceberg-0.13.0-rc1/apache-iceberg-0.13.0.tar.gz.sha512 Transmitting file data ...done Committing transaction... Committed revision 52261. Creating release-announcement-email.txt... Success! The release candidate is available here: https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-0.13.0-rc1 Commit SHA1: ca8bb7d0821f35bbcfa79a39841be8fb630ac3e5 We have generated a release announcement email for you here: /Users/jackye/iceberg/release-announcement-email.txt Please note that you must update the Nexus repository URL contained in the mail before sending it out. The source release script will create a candidate tag based on the HEAD revision in git and will prepare the release tarball, signature, and checksum files. It will also upload the source artifacts to SVN.\nNote the commit SHA1 and candidate location because those will be added to the vote thread.\nOnce the source release is ready, use it to stage convenience binary artifacts in Nexus.\nBuild and stage convenience binaries Convenience binaries are created using the source release tarball from in the last step.\nUntar the source release and go into the release directory:\ntar xzf apache-iceberg-0.13.0.tar.gz cd apache-iceberg-0.13.0 To build and publish the convenience binaries, run the dev/stage-binaries.sh script. This will push to a release staging repository.\nDisable gradle parallelism by setting org.gradle.parallel=false in gradle.properties.\ndev/stage-binaries.sh Next, you need to close the staging repository:\nGo to Nexus and log in In the menu on the left, choose “Staging Repositories” Select the Iceberg repository If multiple staging repositories are created after running the script, verify that gradle parallelism is disabled and try again. At the top, select “Close” and follow the instructions In the comment field use “Apache Iceberg \u003cversion\u003e RC\u003cnum\u003e” Start a VOTE thread The last step for a candidate is to create a VOTE thread on the dev mailing list. The email template is already generated in release-announcement-email.txt with some details filled.\nExample title subject:\n[VOTE] Release Apache Iceberg \u003cVERSION\u003e RC\u003cNUM\u003e Example content:\nHi everyone, I propose the following RC to be released as official Apache Iceberg \u003cVERSION\u003e release. The commit id is \u003cSHA1\u003e * This corresponds to the tag: apache-iceberg-\u003cVERSION\u003e-rc\u003cNUM\u003e * https://github.com/apache/iceberg/commits/apache-iceberg-\u003cVERSION\u003e-rc\u003cNUM\u003e * https://github.com/apache/iceberg/tree/\u003cSHA1\u003e The release tarball, signature, and checksums are here: * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-\u003cVERSION\u003e-rc\u003cNUM\u003e/ You can find the KEYS file here: * https://dist.apache.org/repos/dist/dev/iceberg/KEYS Convenience binary artifacts are staged in Nexus. The Maven repository URL is: * https://repository.apache.org/content/repositories/orgapacheiceberg-\u003cID\u003e/ This release includes important changes that I should have summarized here, but I'm lazy. Please download, verify, and test. Please vote in the next 72 hours. (Weekends excluded) [ ] +1 Release this as Apache Iceberg \u003cVERSION\u003e [ ] +0 [ ] -1 Do not release this because... Only PMC members have binding votes, but other community members are encouraged to cast non-binding votes. This vote will pass if there are 3 binding +1 votes and more binding +1 votes than -1 votes. When a candidate is passed or rejected, reply with the voting result:\nSubject: [RESULT][VOTE] Release Apache Iceberg \u003cVERSION\u003e RC\u003cNUM\u003e Thanks everyone who participated in the vote for Release Apache Iceberg \u003cVERSION\u003e RC\u003cNUM\u003e. The vote result is: +1: 3 (binding), 5 (non-binding) +0: 0 (binding), 0 (non-binding) -1: 0 (binding), 0 (non-binding) Therefore, the release candidate is passed/rejected. Finishing the release After the release vote has passed, you need to release the last candidate’s artifacts.\nBut note that releasing the artifacts should happen around the same time the new docs are released so make sure the documentation changes are prepared when going through the below steps.\nPublishing the release First, copy the source release directory to releases:\nmkdir iceberg cd iceberg svn co https://dist.apache.org/repos/dist/dev/iceberg candidates svn co https://dist.apache.org/repos/dist/release/iceberg releases cp -r candidates/apache-iceberg-\u003cVERSION\u003e-rcN/ releases/apache-iceberg-\u003cVERSION\u003e cd releases svn add apache-iceberg-\u003cVERSION\u003e svn ci -m 'Iceberg: Add release \u003cVERSION\u003e' !!! Note The above step requires PMC privileges to execute.\nNext, add a release tag to the git repository based on the passing candidate tag:\ngit tag -am 'Release Apache Iceberg \u003cVERSION\u003e' apache-iceberg-\u003cVERSION\u003e apache-iceberg-\u003cVERSION\u003e-rcN Then release the candidate repository in Nexus.\nAnnouncing the release To announce the release, wait until Maven central has mirrored the Apache binaries, then update the Iceberg site and send an announcement email:\n[ANNOUNCE] Apache Iceberg release \u003cVERSION\u003e I'm pleased to announce the release of Apache Iceberg \u003cVERSION\u003e! Apache Iceberg is an open table format for huge analytic datasets. Iceberg delivers high query performance for tables with tens of petabytes of data, along with atomic commits, concurrent writes, and SQL-compatible table evolution. This release can be downloaded from: https://www.apache.org/dyn/closer.cgi/iceberg/\u003cTARBALL NAME WITHOUT .tar.gz\u003e/\u003cTARBALL NAME\u003e Java artifacts are available from Maven Central. Thanks to everyone for contributing! Update revapi Create a PR in the iceberg repo to make revapi run on the new release. For an example see this PR.\nUpdate Github Create a PR in the iceberg repository to add the new version to the Github issue template. For an example see this PR. Draft a new release to update Github to show the latest release. A changelog can be generated automatically using Github. Documentation Release Documentation needs to be updated as a part of an Iceberg release after a release candidate is passed. The commands described below assume you are in a directory containing a local clone of the iceberg-docs repository and iceberg repository. Adjust the commands accordingly if it is not the case. Note that all changes in iceberg need to happen against the master branch and changes in iceberg-docs need to happen against the main branch.\nCommon documentation update To start the release process, run the following steps in the iceberg-docs repository to copy docs over: cp -r ../iceberg/format/* ../iceberg-docs/landing-page/content/common/ Change into the iceberg-docs repository and create a branch. cd ../iceberg-docs git checkout -b \u003cBRANCH NAME\u003e Commit, push, and open a PR against the iceberg-docs repo (\u003cBRANCH NAME\u003e -\u003e main) Versioned documentation update Once the common docs changes have been merged into main, the next step is to update the versioned docs.\nIn the iceberg-docs repository, cut a new branch using the version number as the branch name cd ../iceberg-docs git checkout -b \u003cVERSION\u003e git push --set-upstream apache \u003cVERSION\u003e Copy the versioned docs from the iceberg repo into the iceberg-docs repo rm -rf ../iceberg-docs/docs/content cp -r ../iceberg/docs ../iceberg-docs/docs/content Commit the changes and open a PR against the \u003cVERSION\u003e branch in the iceberg-docs repo Javadoc update In the iceberg repository, generate the javadoc for your release and copy it to the javadoc folder in iceberg-docs repo:\ncd ../iceberg ./gradlew refreshJavadoc rm -rf ../iceberg-docs/javadoc cp -r site/docs/javadoc/\u003cVERSION NUMBER\u003e ../iceberg-docs/javadoc This resulted changes in iceberg-docs should be approved in a separate PR.\nUpdate the latest branch Since main is currently the same as the version branch, one needs to rebase latest branch against main:\ngit checkout latest git rebase main git push apache latest Set latest version in iceberg-docs repo The last step is to update the main branch in iceberg-docs to set the latest version. A PR needs to be published in the iceberg-docs repository with the following changes:\nUpdate variable latestVersions.iceberg to the new release version in landing-page/config.toml Update variable latestVersions.iceberg to the new release version and versions.nessie to the version of org.projectnessie.nessie:* from versions.props in docs/config.toml Update list versions with the new release in landing-page/config.toml Update list versions with the new release in docs/config.toml Mark the current latest release notes to past releases under landing-page/content/common/release-notes.md Add release notes for the new release version in landing-page/content/common/release-notes.md How to Verify a Release Each Apache Iceberg release is validated by the community by holding a vote. A community release manager will prepare a release candidate and call a vote on the Iceberg dev list. To validate the release candidate, community members will test it out in their downstream projects and environments. It’s recommended to report the Java, Scala, Spark, Flink and Hive versions you have tested against when you vote.\nIn addition to testing in downstream projects, community members also check the release’s signatures, checksums, and license documentation.\nValidating a source release candidate Release announcements include links to the following:\nA source tarball A signature (.asc) A checksum (.sha512) KEYS file GitHub change comparison After downloading the source tarball, signature, checksum, and KEYS file, here are instructions on how to verify signatures, checksums, and documentation.\nVerifying Signatures First, import the keys.\ncurl https://dist.apache.org/repos/dist/dev/iceberg/KEYS -o KEYS gpg --import KEYS Next, verify the .asc file.\ngpg --verify apache-iceberg-1.4.3.tar.gz.asc Verifying Checksums shasum -a 512 --check apache-iceberg-1.4.3.tar.gz.sha512 Verifying License Documentation Untar the archive and change into the source directory.\ntar xzf apache-iceberg-1.4.3.tar.gz cd apache-iceberg-1.4.3 Run RAT checks to validate license headers.\ndev/check-license Verifying Build and Test To verify that the release candidate builds properly, run the following command.\n./gradlew build Testing release binaries Release announcements will also include a maven repository location. You can use this location to test downstream dependencies by adding it to your maven or gradle build.\nTo use the release in your maven build, add the following to your POM or settings.xml:\n... \u003crepositories\u003e \u003crepository\u003e \u003cid\u003eiceberg-release-candidate\u003c/id\u003e \u003cname\u003eIceberg Release Candidate\u003c/name\u003e \u003curl\u003e${MAVEN_URL}\u003c/url\u003e \u003c/repository\u003e \u003c/repositories\u003e ... To use the release in your gradle build, add the following to your build.gradle:\nrepositories { mavenCentral() maven { url \"${MAVEN_URL}\" } } !!! Note Replace ${MAVEN_URL} with the URL provided in the release announcement\nVerifying with Spark To verify using spark, start a spark-shell with a command like the following command (use the appropriate spark-runtime jar for the Spark installation):\nspark-shell \\ --conf spark.jars.repositories=${MAVEN_URL} \\ --packages org.apache.iceberg:iceberg-spark3-runtime:1.4.3 \\ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \\ --conf spark.sql.catalog.local.type=hadoop \\ --conf spark.sql.catalog.local.warehouse=${LOCAL_WAREHOUSE_PATH} \\ --conf spark.sql.catalog.local.default-namespace=default \\ --conf spark.sql.defaultCatalog=local Verifying with Flink To verify using Flink, start a Flink SQL Client with the following command:\nwget ${MAVEN_URL}/iceberg-flink-runtime/1.4.3/iceberg-flink-runtime-1.4.3.jar sql-client.sh embedded \\ -j iceberg-flink-runtime-1.4.3.jar \\ -j ${FLINK_CONNECTOR_PACKAGE}-${HIVE_VERSION}_${SCALA_VERSION}-${FLINK_VERSION}.jar \\ shell Voting Votes are cast by replying to the release candidate announcement email on the dev mailing list with either +1, 0, or -1.\n[ ] +1 Release this as Apache Iceberg 1.4.3 [ ] +0 [ ] -1 Do not release this because…\nIn addition to your vote, it’s customary to specify if your vote is binding or non-binding. Only members of the Project Management Committee have formally binding votes. If you’re unsure, you can specify that your vote is non-binding. To read more about voting in the Apache framework, checkout the Voting information page on the Apache foundation’s website.\n","description":"","title":"How To Release","uri":"/how-to-release/"},{"categories":null,"content":" Iceberg Catalogs Overview You may think of Iceberg as a format for managing data in a single table, but the Iceberg library needs a way to keep track of those tables by name. Tasks like creating, dropping, and renaming tables are the responsibility of a catalog. Catalogs manage a collection of tables that are usually grouped into namespaces. The most important responsibility of a catalog is tracking a table’s current metadata, which is provided by the catalog when you load a table.\nThe first step when using an Iceberg client is almost always initializing and configuring a catalog. The configured catalog is then used by compute engines to execute catalog operations. Multiple types of compute engines using a shared Iceberg catalog allows them to share a common data layer.\nA catalog is almost always configured through the processing engine which passes along a set of properties during initialization. Different processing engines have different ways to configure a catalog. When configuring a catalog, it’s always best to refer to the Iceberg documentation as well as the docs for the specific processing engine being used. Ultimately, these configurations boil down to a common set of catalog properties that will be passed to configure the Iceberg catalog.\nCatalog Implementations Iceberg catalogs are flexible and can be implemented using almost any backend system. They can be plugged into any Iceberg runtime, and allow any processing engine that supports Iceberg to load the tracked Iceberg tables. Iceberg also comes with a number of catalog implementations that are ready to use out of the box.\nThis includes:\nREST - a server-side catalog that’s exposed through a REST API Hive Metastore - tracks namespaces and tables using a Hive metastore JDBC - tracks namespaces and tables in a simple JDBC database Nessie - a transactional catalog that tracks namespaces and tables in a database with git-like version control There are more catalog types in addition to the ones listed here as well as custom catalogs that are developed to include specialized functionality.\nDecoupling Using the REST Catalog The REST catalog was introduced in the Iceberg 0.14.0 release and provides greater control over how Iceberg catalogs are implemented. Instead of using technology-specific logic contained in the catalog clients, the implementation details of a REST catalog lives on the catalog server. If you’re familiar with Hive, this is somewhat similar to the Hive thrift service that allows access to a hive server over a single port. The server-side logic can be written in any language and use any custom technology, as long as the API follows the Iceberg REST Open API specification.\nA great benefit of the REST catalog is that it allows you to use a single client to talk to any catalog backend. This increased flexibility makes it easier to make custom catalogs compatible with engines like Athena or Starburst without requiring the inclusion of a Jar into the classpath.\n","description":"","title":"Iceberg Catalogs","uri":"/catalog/"},{"categories":null,"content":" Multi-Engine Support Apache Iceberg is an open standard for huge analytic tables that can be used by any processing engine. The community continuously improves Iceberg core library components to enable integrations with different compute engines that power analytics, business intelligence, machine learning, etc. Connectors for Spark, Flink and Hive are maintained in the main Iceberg repository.\nMulti-Version Support Processing engine connectors maintained in the iceberg repository are built for multiple versions.\nFor Spark and Flink, each new version that introduces backwards incompatible upgrade has its dedicated integration codebase and release artifacts. For example, the code for Iceberg Spark 3.1 integration is under /spark/v3.1 and the code for Iceberg Spark 3.2 integration is under /spark/v3.2. Different artifacts (iceberg-spark-3.1_2.12 and iceberg-spark-3.2_2.12) are released for users to consume. By doing this, changes across versions are isolated. New features in Iceberg could be developed against the latest features of an engine without breaking support of old APIs in past engine versions.\nFor Hive, Hive 2 uses the iceberg-mr package for Iceberg integration, and Hive 3 requires an additional dependency of the iceberg-hive3 package.\nRuntime Jar Iceberg provides a runtime connector jar for each supported version of Spark, Flink and Hive. When using Iceberg with these engines, the runtime jar is the only addition to the classpath needed in addition to vendor dependencies. For example, to use Iceberg with Spark 3.2 and AWS integrations, iceberg-spark-runtime-3.2_2.12 and AWS SDK dependencies are needed for the Spark installation.\nSpark and Flink provide different runtime jars for each supported engine version. Hive 2 and Hive 3 currently share the same runtime jar. The runtime jar names and latest version download links are listed in the tables below.\nEngine Version Lifecycle Each engine version undergoes the following lifecycle stages:\nBeta: a new engine version is supported, but still in the experimental stage. Maybe the engine version itself is still in preview (e.g. Spark 3.0.0-preview), or the engine does not yet have full feature compatibility compared to old versions yet. This stage allows Iceberg to release an engine version support without the need to wait for feature parity, shortening the release time. Maintained: an engine version is actively maintained by the community. Users can expect parity for most features across all the maintained versions. If a feature has to leverage some new engine functionalities that older versions don’t have, then feature parity across maintained versions is not guaranteed. Deprecated: an engine version is no longer actively maintained. People who are still interested in the version can backport any necessary feature or bug fix from newer versions, but the community will not spend effort in achieving feature parity. Iceberg recommends users to move towards a newer version. Contributions to a deprecated version is expected to diminish over time, so that eventually no change is added to a deprecated version. End-of-life: a vote can be initiated in the community to fully remove a deprecated version out of the Iceberg repository to mark as its end of life. Current Engine Version Lifecycle Status Apache Spark Version Lifecycle Stage Initial Iceberg Support Latest Iceberg Support Latest Runtime Jar 2.4 End of Life 0.7.0-incubating 1.2.1 iceberg-spark-runtime-2.4 3.0 End of Life 0.9.0 1.0.0 iceberg-spark-runtime-3.0_2.12 3.1 End of Life 0.12.0 1.3.1 iceberg-spark-runtime-3.1_2.12 [1] 3.2 Deprecated 0.13.0 1.4.3 iceberg-spark-runtime-3.2_2.12 3.3 Maintained 0.14.0 1.4.3 iceberg-spark-runtime-3.3_2.12 3.4 Maintained 1.3.0 1.4.3 iceberg-spark-runtime-3.4_2.12 3.5 Maintained 1.4.0 1.4.3 iceberg-spark-runtime-3.5_2.12 [1] Spark 3.1 shares the same runtime jar iceberg-spark3-runtime with Spark 3.0 before Iceberg 0.13.0 Apache Flink Based on the guideline of the Flink community, only the latest 2 minor versions are actively maintained. Users should continuously upgrade their Flink version to stay up-to-date.\nVersion Lifecycle Stage Initial Iceberg Support Latest Iceberg Support Latest Runtime Jar 1.11 End of Life 0.9.0 0.12.1 iceberg-flink-runtime 1.12 End of Life 0.12.0 0.13.1 iceberg-flink-runtime-1.12 [3] 1.13 End of Life 0.13.0 1.0.0 iceberg-flink-runtime-1.13 1.14 End of Life 0.13.0 1.2.0 iceberg-flink-runtime-1.14 1.15 Deprecated 0.14.0 1.4.3 iceberg-flink-runtime-1.15 1.16 Maintained 1.1.0 1.4.3 iceberg-flink-runtime-1.16 1.17 Maintained 1.3.0 1.4.3 iceberg-flink-runtime-1.17 [3] Flink 1.12 shares the same runtime jar iceberg-flink-runtime with Flink 1.11 before Iceberg 0.13.0 Apache Hive Version Recommended minor version Lifecycle Stage Initial Iceberg Support Latest Iceberg Support Latest Runtime Jar 2 2.3.8 Maintained 0.8.0-incubating 1.4.3 iceberg-hive-runtime 3 3.1.2 Maintained 0.10.0 1.4.3 iceberg-hive-runtime Developer Guide Maintaining existing engine versions Iceberg recommends the following for developers who are maintaining existing engine versions:\nNew features should always be prioritized first in the latest version, which is either a maintained or beta version. For features that could be backported, contributors are encouraged to either perform backports to all maintained versions, or at least create some issues to track the backport. If the change is small enough, updating all versions in a single PR is acceptable. Otherwise, using separated PRs for each version is recommended. Supporting new engines Iceberg recommends new engines to build support by importing the Iceberg libraries to the engine’s project. This allows the Iceberg support to evolve with the engine. Projects such as Trino and Presto are good examples of such support strategy.\nIn this approach, an Iceberg version upgrade is needed for an engine to consume new Iceberg features. To facilitate engine development against unreleased Iceberg features, a daily snapshot is published in the Apache snapshot repository.\nIf bringing an engine directly to the Iceberg main repository is needed, please raise a discussion thread in the Iceberg community.\n","description":"","title":"Multi-Engine Support","uri":"/multi-engine-support/"},{"categories":null,"content":" Puffin file format This is a specification for Puffin, a file format designed to store information such as indexes and statistics about data managed in an Iceberg table that cannot be stored directly within the Iceberg manifest. A Puffin file contains arbitrary pieces of information (here called “blobs”), along with metadata necessary to interpret them. The blobs supported by Iceberg are documented at Blob types.\nFormat specification A file conforming to the Puffin file format specification should have the structure as described below.\nVersions Currently, there is a single version of the Puffin file format, described below.\nFile structure The Puffin file has the following structure\nMagic Blob₁ Blob₂ ... Blobₙ Footer where\nMagic is four bytes 0x50, 0x46, 0x41, 0x31 (short for: Puffin Fratercula arctica, version 1), Blobᵢ is i-th blob contained in the file, to be interpreted by application according to the footer, Footer is defined below. Footer structure Footer has the following structure\nMagic FooterPayload FooterPayloadSize Flags Magic where\nMagic: four bytes, same as at the beginning of the file FooterPayload: optionally compressed, UTF-8 encoded JSON payload describing the blobs in the file, with the structure described below FooterPayloadSize: a length in bytes of the FooterPayload (after compression, if compressed), stored as 4 byte integer Flags: 4 bytes for boolean flags byte 0 (first) bit 0 (lowest bit): whether FooterPayload is compressed all other bits are reserved for future use and should be set to 0 on write all other bytes are reserved for future use and should be set to 0 on write A 4 byte integer is always signed, in a two’s complement representation, stored little-endian.\nFooter Payload Footer payload bytes is either uncompressed or LZ4-compressed (as a single LZ4 compression frame with content size present), UTF-8 encoded JSON payload representing a single FileMetadata object.\nFileMetadata FileMetadata has the following fields\nField Name Field Type Required Description blobs list of BlobMetadata objects yes properties JSON object with string property values no storage for arbitrary meta-information, like writer identification/version. See Common properties for properties that are recommended to be set by a writer. BlobMetadata BlobMetadata has the following fields\nField Name Field Type Required Description type JSON string yes See Blob types fields JSON list of ints yes List of field IDs the blob was computed for; the order of items is used to compute sketches stored in the blob. snapshot-id JSON long yes ID of the Iceberg table’s snapshot the blob was computed from. sequence-number JSON long yes Sequence number of the Iceberg table’s snapshot the blob was computed from. offset JSON long yes The offset in the file where the blob contents start length JSON long yes The length of the blob stored in the file (after compression, if compressed) compression-codec JSON string no See Compression codecs. If omitted, the data is assumed to be uncompressed. properties JSON object with string property values no storage for arbitrary meta-information about the blob Blob types The blobs can be of a type listed below\napache-datasketches-theta-v1 blob type A serialized form of a “compact” Theta sketch produced by the Apache DataSketches library. The sketch is obtained by constructing Alpha family sketch with default seed, and feeding it with individual distinct values converted to bytes using Iceberg’s single-value serialization.\nThe blob metadata for this blob may include following properties:\nndv: estimate of number of distinct values, derived from the sketch. Compression codecs The data can also be uncompressed. If it is compressed the codec should be one of codecs listed below. For maximal interoperability, other codecs are not supported.\nCodec name Description lz4 Single LZ4 compression frame, with content size present zstd Single Zstandard compression frame, with content size present __ Common properties When writing a Puffin file it is recommended to set the following fields in the FileMetadata’s properties field.\ncreated-by - human-readable identification of the application writing the file, along with its version. Example “Trino version 381”. ","description":"","title":"Puffin Spec","uri":"/puffin-spec/"},{"categories":null,"content":" Roadmap Overview This roadmap outlines projects that the Iceberg community is working on. Each high-level item links to a Github project board that tracks the current status. Related design docs will be linked on the planning boards.\nGeneral Multi-table transaction support Views Support Change Data Capture (CDC) Support Snapshot tagging and branching Inline file compaction Delete File compaction Z-ordering / Space-filling curves Support UPSERT Clients Rust and Go projects are pointing to their respective repositories which include their own issues as the implementations are not final.\nAdd the Iceberg Python Client Add the Iceberg Rust Client Add the Iceberg Go Client Spec V2 Views Spec DSv2 streaming improvements Secondary indexes Spec V3 Encryption Relative paths Default field values ","description":"","title":"Roadmap","uri":"/roadmap/"},{"categories":null,"content":" Reporting Security Issues The Apache Iceberg Project uses the standard process outlined by the Apache Security Team for reporting vulnerabilities. Note that vulnerabilities should not be publicly disclosed until the project has responded.\nTo report a possible security vulnerability, please email security@iceberg.apache.org.\nVerifying Signed Releases Please refer to the instructions on the Release Verification page.\n","description":"","title":"Security","uri":"/security/"},{"categories":null,"content":" Iceberg Table Spec This is a specification for the Iceberg table format that is designed to manage a large, slow-changing collection of files in a distributed file system or key-value store as a table.\nFormat Versioning Versions 1 and 2 of the Iceberg spec are complete and adopted by the community.\nThe format version number is incremented when new features are added that will break forward-compatibility—that is, when older readers would not read newer table features correctly. Tables may continue to be written with an older version of the spec to ensure compatibility by not using features that are not yet implemented by processing engines.\nVersion 1: Analytic Data Tables Version 1 of the Iceberg spec defines how to manage large analytic tables using immutable file formats: Parquet, Avro, and ORC.\nAll version 1 data and metadata files are valid after upgrading a table to version 2. Appendix E documents how to default version 2 fields when reading version 1 metadata.\nVersion 2: Row-level Deletes Version 2 of the Iceberg spec adds row-level updates and deletes for analytic tables with immutable files.\nThe primary change in version 2 adds delete files to encode rows that are deleted in existing data files. This version can be used to delete or replace individual rows in immutable data files without rewriting the files.\nIn addition to row-level deletes, version 2 makes some requirements stricter for writers. The full set of changes are listed in Appendix E.\nGoals Serializable isolation – Reads will be isolated from concurrent writes and always use a committed snapshot of a table’s data. Writes will support removing and adding files in a single operation and are never partially visible. Readers will not acquire locks. Speed – Operations will use O(1) remote calls to plan the files for a scan and not O(n) where n grows with the size of the table, like the number of partitions or files. Scale – Job planning will be handled primarily by clients and not bottleneck on a central metadata store. Metadata will include information needed for cost-based optimization. Evolution – Tables will support full schema and partition spec evolution. Schema evolution supports safe column add, drop, reorder and rename, including in nested structures. Dependable types – Tables will provide well-defined and dependable support for a core set of types. Storage separation – Partitioning will be table configuration. Reads will be planned using predicates on data values, not partition values. Tables will support evolving partition schemes. Formats – Underlying data file formats will support identical schema evolution rules and types. Both read-optimized and write-optimized formats will be available. Overview This table format tracks individual data files in a table instead of directories. This allows writers to create data files in-place and only adds files to the table in an explicit commit.\nTable state is maintained in metadata files. All changes to table state create a new metadata file and replace the old metadata with an atomic swap. The table metadata file tracks the table schema, partitioning config, custom properties, and snapshots of the table contents. A snapshot represents the state of a table at some time and is used to access the complete set of data files in the table.\nData files in snapshots are tracked by one or more manifest files that contain a row for each data file in the table, the file’s partition data, and its metrics. The data in a snapshot is the union of all files in its manifests. Manifest files are reused across snapshots to avoid rewriting metadata that is slow-changing. Manifests can track data files with any subset of a table and are not associated with partitions.\nThe manifests that make up a snapshot are stored in a manifest list file. Each manifest list stores metadata about manifests, including partition stats and data file counts. These stats are used to avoid reading manifests that are not required for an operation.\nOptimistic Concurrency An atomic swap of one table metadata file for another provides the basis for serializable isolation. Readers use the snapshot that was current when they load the table metadata and are not affected by changes until they refresh and pick up a new metadata location.\nWriters create table metadata files optimistically, assuming that the current version will not be changed before the writer’s commit. Once a writer has created an update, it commits by swapping the table’s metadata file pointer from the base version to the new version.\nIf the snapshot on which an update is based is no longer current, the writer must retry the update based on the new current version. Some operations support retry by re-applying metadata changes and committing, under well-defined conditions. For example, a change that rewrites files can be applied to a new table snapshot if all of the rewritten files are still in the table.\nThe conditions required by a write to successfully commit determines the isolation level. Writers can select what to validate and can make different isolation guarantees.\nSequence Numbers The relative age of data and delete files relies on a sequence number that is assigned to every successful commit. When a snapshot is created for a commit, it is optimistically assigned the next sequence number, and it is written into the snapshot’s metadata. If the commit fails and must be retried, the sequence number is reassigned and written into new snapshot metadata.\nAll manifests, data files, and delete files created for a snapshot inherit the snapshot’s sequence number. Manifest file metadata in the manifest list stores a manifest’s sequence number. New data and metadata file entries are written with null in place of a sequence number, which is replaced with the manifest’s sequence number at read time. When a data or delete file is written to a new manifest (as “existing”), the inherited sequence number is written to ensure it does not change after it is first inherited.\nInheriting the sequence number from manifest metadata allows writing a new manifest once and reusing it in commit retries. To change a sequence number for a retry, only the manifest list must be rewritten – which would be rewritten anyway with the latest set of manifests.\nRow-level Deletes Row-level deletes are stored in delete files.\nThere are two ways to encode a row-level delete:\nPosition deletes mark a row deleted by data file path and the row position in the data file Equality deletes mark a row deleted by one or more column values, like id = 5 Like data files, delete files are tracked by partition. In general, a delete file must be applied to older data files with the same partition; see Scan Planning for details. Column metrics can be used to determine whether a delete file’s rows overlap the contents of a data file or a scan range.\nFile System Operations Iceberg only requires that file systems support the following operations:\nIn-place write – Files are not moved or altered once they are written. Seekable reads – Data file formats require seek support. Deletes – Tables delete files that are no longer used. These requirements are compatible with object stores, like S3.\nTables do not require random-access writes. Once written, data and metadata files are immutable until they are deleted.\nTables do not require rename, except for tables that use atomic rename to implement the commit operation for new metadata files.\nSpecification Terms Schema – Names and types of fields in a table. Partition spec – A definition of how partition values are derived from data fields. Snapshot – The state of a table at some point in time, including the set of all data files. Manifest list – A file that lists manifest files; one per snapshot. Manifest – A file that lists data or delete files; a subset of a snapshot. Data file – A file that contains rows of a table. Delete file – A file that encodes rows of a table that are deleted by position or data values. Writer requirements Some tables in this spec have columns that specify requirements for v1 and v2 tables. These requirements are intended for writers when adding metadata files to a table with the given version.\nRequirement Write behavior (blank) The field should be omitted optional The field can be written required The field must be written Readers should be more permissive because v1 metadata files are allowed in v2 tables so that tables can be upgraded to v2 without rewriting the metadata tree. For manifest list and manifest files, this table shows the expected v2 read behavior:\nv1 v2 v2 read behavior optional Read the field as optional required Read the field as optional; it may be missing in v1 files optional Ignore the field optional optional Read the field as optional optional required Read the field as optional; it may be missing in v1 files required Ignore the field required optional Read the field as optional required required Fill in a default or throw an exception if the field is missing Readers may be more strict for metadata JSON files because the JSON files are not reused and will always match the table version. Required v2 fields that were not present in v1 or optional in v1 may be handled as required fields. For example, a v2 table that is missing last-sequence-number can throw an exception.\nSchemas and Data Types A table’s schema is a list of named columns. All data types are either primitives or nested types, which are maps, lists, or structs. A table schema is also a struct type.\nFor the representations of these types in Avro, ORC, and Parquet file formats, see Appendix A.\nNested Types A struct is a tuple of typed values. Each field in the tuple is named and has an integer id that is unique in the table schema. Each field can be either optional or required, meaning that values can (or cannot) be null. Fields may be any type. Fields may have an optional comment or doc string. Fields can have default values.\nA list is a collection of values with some element type. The element field has an integer id that is unique in the table schema. Elements can be either optional or required. Element types may be any type.\nA map is a collection of key-value pairs with a key type and a value type. Both the key field and value field each have an integer id that is unique in the table schema. Map keys are required and map values can be either optional or required. Both map keys and map values may be any type, including nested types.\nPrimitive Types Primitive type Description Requirements boolean True or false int 32-bit signed integers Can promote to long long 64-bit signed integers float 32-bit IEEE 754 floating point Can promote to double double 64-bit IEEE 754 floating point decimal(P,S) Fixed-point decimal; precision P, scale S Scale is fixed [1], precision must be 38 or less date Calendar date without timezone or time time Time of day without date, timezone Microsecond precision [2] timestamp Timestamp without timezone Microsecond precision [2] timestamptz Timestamp with timezone Stored as UTC [2] string Arbitrary-length character sequences Encoded with UTF-8 [3] uuid Universally unique identifiers Should use 16-byte fixed fixed(L) Fixed-length byte array of length L binary Arbitrary-length byte array Notes:\nDecimal scale is fixed and cannot be changed by schema evolution. Precision can only be widened. All time and timestamp values are stored with microsecond precision. Timestamps with time zone represent a point in time: values are stored as UTC and do not retain a source time zone (2017-11-16 17:10:34 PST is stored/retrieved as 2017-11-17 01:10:34 UTC and these values are considered identical). Timestamps without time zone represent a date and time of day regardless of zone: the time value is independent of zone adjustments (2017-11-16 17:10:34 is always retrieved as 2017-11-16 17:10:34). Timestamp values are stored as a long that encodes microseconds from the unix epoch. Character strings must be stored as UTF-8 encoded byte arrays. For details on how to serialize a schema to JSON, see Appendix C.\nDefault values Default values can be tracked for struct fields (both nested structs and the top-level schema’s struct). There can be two defaults with a field:\ninitial-default is used to populate the field’s value for all records that were written before the field was added to the schema write-default is used to populate the field’s value for any records written after the field was added to the schema, if the writer does not supply the field’s value The initial-default is set only when a field is added to an existing schema. The write-default is initially set to the same value as initial-default and can be changed through schema evolution. If either default is not set for an optional field, then the default value is null for compatibility with older spec versions.\nThe initial-default and write-default produce SQL default value behavior, without rewriting data files. SQL default value behavior when a field is added handles all existing rows as though the rows were written with the new field’s default value. Default value changes may only affect future records and all known fields are written into data files. Omitting a known field when writing a data file is never allowed. The write default for a field must be written if a field is not supplied to a write. If the write default for a required field is not set, the writer must fail.\nDefault values are attributes of fields in schemas and serialized with fields in the JSON format. See Appendix C.\nSchema Evolution Schemas may be evolved by type promotion or adding, deleting, renaming, or reordering fields in structs (both nested structs and the top-level schema’s struct).\nEvolution applies changes to the table’s current schema to produce a new schema that is identified by a unique schema ID, is added to the table’s list of schemas, and is set as the table’s current schema.\nValid type promotions are:\nint to long float to double decimal(P, S) to decimal(P', S) if P' \u003e P – widen the precision of decimal types. Any struct, including a top-level schema, can evolve through deleting fields, adding new fields, renaming existing fields, reordering existing fields, or promoting a primitive using the valid type promotions. Adding a new field assigns a new ID for that field and for any nested fields. Renaming an existing field must change the name, but not the field ID. Deleting a field removes it from the current schema. Field deletion cannot be rolled back unless the field was nullable or if the current snapshot has not changed.\nGrouping a subset of a struct’s fields into a nested struct is not allowed, nor is moving fields from a nested struct into its immediate parent struct (struct\u003ca, b, c\u003e ↔ struct\u003ca, struct\u003cb, c\u003e\u003e). Evolving primitive types to structs is not allowed, nor is evolving a single-field struct to a primitive (map\u003cstring, int\u003e ↔ map\u003cstring, struct\u003cint\u003e\u003e).\nStruct evolution requires the following rules for default values:\nThe initial-default must be set when a field is added and cannot change The write-default must be set when a field is added and may change When a required field is added, both defaults must be set to a non-null value When an optional field is added, the defaults may be null and should be explicitly set When a new field is added to a struct with a default value, updating the struct’s default is optional If a field value is missing from a struct’s initial-default, the field’s initial-default must be used for the field If a field value is missing from a struct’s write-default, the field’s write-default must be used for the field Column Projection Columns in Iceberg data files are selected by field id. The table schema’s column names and order may change after a data file is written, and projection must be done using field ids. If a field id is missing from a data file, its value for each row should be null.\nFor example, a file may be written with schema 1: a int, 2: b string, 3: c double and read using projection schema 3: measurement, 2: name, 4: a. This must select file columns c (renamed to measurement), b (now called name), and a column of null values called a; in that order.\nTables may also define a property schema.name-mapping.default with a JSON name mapping containing a list of field mapping objects. These mappings provide fallback field ids to be used when a data file does not contain field id information. Each object should contain\nnames: A required list of 0 or more names for a field. field-id: An optional Iceberg field ID used when a field’s name is present in names fields: An optional list of field mappings for child field of structs, maps, and lists. Field mapping fields are constrained by the following rules:\nA name may contain . but this refers to a literal name, not a nested field. For example, a.b refers to a field named a.b, not child field b of field a. Each child field should be defined with their own field mapping under fields. Multiple values for names may be mapped to a single field ID to support cases where a field may have different names in different data files. For example, all Avro field aliases should be listed in names. Fields which exist only in the Iceberg schema and not in imported data files may use an empty names list. Fields that exist in imported files but not in the Iceberg schema may omit field-id. List types should contain a mapping in fields for element. Map types should contain mappings in fields for key and value. Struct types should contain mappings in fields for their child fields. For details on serialization, see Appendix C.\nIdentifier Field IDs A schema can optionally track the set of primitive fields that identify rows in a table, using the property identifier-field-ids (see JSON encoding in Appendix C).\nTwo rows are the “same”—that is, the rows represent the same entity—if the identifier fields are equal. However, uniqueness of rows by this identifier is not guaranteed or required by Iceberg and it is the responsibility of processing engines or data providers to enforce.\nIdentifier fields may be nested in structs but cannot be nested within maps or lists. Float, double, and optional fields cannot be used as identifier fields and a nested field cannot be used as an identifier field if it is nested in an optional struct, to avoid null values in identifiers.\nReserved Field IDs Iceberg tables must not use field ids greater than 2147483447 (Integer.MAX_VALUE - 200). This id range is reserved for metadata columns that can be used in user data schemas, like the _file column that holds the file path in which a row was stored.\nThe set of metadata columns is:\nField id, name Type Description 2147483646 _file string Path of the file in which a row is stored 2147483645 _pos long Ordinal position of a row in the source data file 2147483644 _deleted boolean Whether the row has been deleted 2147483643 _spec_id int Spec ID used to track the file containing a row 2147483642 _partition struct Partition to which a row belongs 2147483546 file_path string Path of a file, used in position-based delete files 2147483545 pos long Ordinal position of a row, used in position-based delete files 2147483544 row struct\u003c...\u003e Deleted row values, used in position-based delete files Partitioning Data files are stored in manifests with a tuple of partition values that are used in scans to filter out files that cannot contain records that match the scan’s filter predicate. Partition values for a data file must be the same for all records stored in the data file. (Manifests store data files from any partition, as long as the partition spec is the same for the data files.)\nTables are configured with a partition spec that defines how to produce a tuple of partition values from a record. A partition spec has a list of fields that consist of:\nA source column id from the table’s schema A partition field id that is used to identify a partition field and is unique within a partition spec. In v2 table metadata, it is unique across all partition specs. A transform that is applied to the source column to produce a partition value A partition name The source column, selected by id, must be a primitive type and cannot be contained in a map or list, but may be nested in a struct. For details on how to serialize a partition spec to JSON, see Appendix C.\nPartition specs capture the transform from table data to partition values. This is used to transform predicates to partition predicates, in addition to transforming data values. Deriving partition predicates from column predicates on the table data is used to separate the logical queries from physical storage: the partitioning can change and the correct partition filters are always derived from column predicates. This simplifies queries because users don’t have to supply both logical predicates and partition predicates. For more information, see Scan Planning below.\nPartition Transforms Transform name Description Source types Result type identity Source value, unmodified Any Source type bucket[N] Hash of value, mod N (see below) int, long, decimal, date, time, timestamp, timestamptz, string, uuid, fixed, binary int truncate[W] Value truncated to width W (see below) int, long, decimal, string Source type year Extract a date or timestamp year, as years from 1970 date, timestamp, timestamptz int month Extract a date or timestamp month, as months from 1970-01-01 date, timestamp, timestamptz int day Extract a date or timestamp day, as days from 1970-01-01 date, timestamp, timestamptz int hour Extract a timestamp hour, as hours from 1970-01-01 00:00:00 timestamp, timestamptz int void Always produces null Any Source type or int All transforms must return null for a null input value.\nThe void transform may be used to replace the transform in an existing partition field so that the field is effectively dropped in v1 tables. See partition evolution below.\nBucket Transform Details Bucket partition transforms use a 32-bit hash of the source value. The 32-bit hash implementation is the 32-bit Murmur3 hash, x86 variant, seeded with 0.\nTransforms are parameterized by a number of buckets [1], N. The hash mod N must produce a positive value by first discarding the sign bit of the hash value. In pseudo-code, the function is:\ndef bucket_N(x) = (murmur3_x86_32_hash(x) \u0026 Integer.MAX_VALUE) % N Notes:\nChanging the number of buckets as a table grows is possible by evolving the partition spec. For hash function details by type, see Appendix B.\nTruncate Transform Details Type Config Truncate specification Examples int W, width v - (v % W)\tremainders must be positive\t[1] W=10: 1 → 0, -1 → -10 long W, width v - (v % W)\tremainders must be positive\t[1] W=10: 1 → 0, -1 → -10 decimal W, width (no scale) scaled_W = decimal(W, scale(v)) v - (v % scaled_W)\t[1, 2] W=50, s=2: 10.65 → 10.50 string L, length Substring of length L: v.substring(0, L) [3] L=3: iceberg → ice Notes:\nThe remainder, v % W, must be positive. For languages where % can produce negative values, the correct truncate function is: v - (((v % W) + W) % W) The width, W, used to truncate decimal values is applied using the scale of the decimal column to avoid additional (and potentially conflicting) parameters. Strings are truncated to a valid UTF-8 string with no more than L code points. Partition Evolution Table partitioning can be evolved by adding, removing, renaming, or reordering partition spec fields.\nChanging a partition spec produces a new spec identified by a unique spec ID that is added to the table’s list of partition specs and may be set as the table’s default spec.\nWhen evolving a spec, changes should not cause partition field IDs to change because the partition field IDs are used as the partition tuple field IDs in manifest files.\nIn v2, partition field IDs must be explicitly tracked for each partition field. New IDs are assigned based on the last assigned partition ID in table metadata.\nIn v1, partition field IDs were not tracked, but were assigned sequentially starting at 1000 in the reference implementation. This assignment caused problems when reading metadata tables based on manifest files from multiple specs because partition fields with the same ID may contain different data types. For compatibility with old versions, the following rules are recommended for partition evolution in v1 tables:\nDo not reorder partition fields Do not drop partition fields; instead replace the field’s transform with the void transform Only add partition fields at the end of the previous partition spec Sorting Users can sort their data within partitions by columns to gain performance. The information on how the data is sorted can be declared per data or delete file, by a sort order.\nA sort order is defined by a sort order id and a list of sort fields. The order of the sort fields within the list defines the order in which the sort is applied to the data. Each sort field consists of:\nA source column id from the table’s schema A transform that is used to produce values to be sorted on from the source column. This is the same transform as described in partition transforms. A sort direction, that can only be either asc or desc A null order that describes the order of null values when sorted. Can only be either nulls-first or nulls-last Order id 0 is reserved for the unsorted order.\nSorting floating-point numbers should produce the following behavior: -NaN \u003c -Infinity \u003c -value \u003c -0 \u003c 0 \u003c value \u003c Infinity \u003c NaN. This aligns with the implementation of Java floating-point types comparisons.\nA data or delete file is associated with a sort order by the sort order’s id within a manifest. Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes.\nManifests A manifest is an immutable Avro file that lists data files or delete files, along with each file’s partition data tuple, metrics, and tracking information. One or more manifest files are used to store a snapshot, which tracks all of the files in a table at some point in time. Manifests are tracked by a manifest list for each table snapshot.\nA manifest is a valid Iceberg data file: files must use valid Iceberg formats, schemas, and column projection.\nA manifest may store either data files or delete files, but not both because manifests that contain delete files are scanned first during job planning. Whether a manifest is a data manifest or a delete manifest is stored in manifest metadata.\nA manifest stores files for a single partition spec. When a table’s partition spec changes, old files remain in the older manifest and newer files are written to a new manifest. This is required because a manifest file’s schema is based on its partition spec (see below). The partition spec of each manifest is also used to transform predicates on the table’s data rows into predicates on partition values that are used during job planning to select files from a manifest.\nA manifest file must store the partition spec and other metadata as properties in the Avro file’s key-value metadata:\nv1 v2 Key Value required required schema JSON representation of the table schema at the time the manifest was written optional required schema-id ID of the schema used to write the manifest as a string required required partition-spec JSON fields representation of the partition spec used to write the manifest optional required partition-spec-id ID of the partition spec used to write the manifest as a string optional required format-version Table format version number of the manifest as a string required content Type of content files tracked by the manifest: “data” or “deletes” The schema of a manifest file is a struct called manifest_entry with the following fields:\nv1 v2 Field id, name Type Description required required 0 status int with meaning: 0: EXISTING 1: ADDED 2: DELETED Used to track additions and deletions. Deletes are informational only and not used in scans. required optional 1 snapshot_id long Snapshot id where the file was added, or deleted if status is 2. Inherited when null. optional 3 sequence_number long Data sequence number of the file. Inherited when null and status is 1 (added). optional 4 file_sequence_number long File sequence number indicating when the file was added. Inherited when null and status is 1 (added). required required 2 data_file data_file struct (see below) File path, partition tuple, metrics, … data_file is a struct with the following fields:\nv1 v2 Field id, name Type Description required 134 content int with meaning: 0: DATA, 1: POSITION DELETES, 2: EQUALITY DELETES Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files) required required 100 file_path string Full URI for the file with FS scheme required required 101 file_format string String file format name, avro, orc or parquet required required 102 partition struct\u003c...\u003e Partition data tuple, schema based on the partition spec output using partition field ids for the struct field ids required required 103 record_count long Number of records in this file required required 104 file_size_in_bytes long Total file size in bytes required 105 block_size_in_bytes long Deprecated. Always write a default in v1. Do not write in v2. optional 106 file_ordinal int Deprecated. Do not write. optional 107 sort_columns list\u003c112: int\u003e Deprecated. Do not write. optional optional 108 column_sizes map\u003c117: int, 118: long\u003e Map from column id to the total size on disk of all regions that store the column. Does not include bytes necessary to read other columns, like footers. Leave null for row-oriented formats (Avro) optional optional 109 value_counts map\u003c119: int, 120: long\u003e Map from column id to number of values in the column (including null and NaN values) optional optional 110 null_value_counts map\u003c121: int, 122: long\u003e Map from column id to number of null values in the column optional optional 137 nan_value_counts map\u003c138: int, 139: long\u003e Map from column id to number of NaN values in the column optional optional 111 distinct_counts map\u003c123: int, 124: long\u003e Map from column id to number of distinct values in the column; distinct counts must be derived using values in the file by counting or using sketches, but not using methods like merging existing distinct counts optional optional 125 lower_bounds map\u003c126: int, 127: binary\u003e Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2] optional optional 128 upper_bounds map\u003c129: int, 130: binary\u003e Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2] optional optional 131 key_metadata binary Implementation-specific key metadata for encryption optional optional 132 split_offsets list\u003c133: long\u003e Split offsets for the data file. For example, all row group offsets in a Parquet file. Must be sorted ascending optional 135 equality_ids list\u003c136: int\u003e Field ids used to determine row equality in equality delete files. Required when content=2 and should be null otherwise. Fields with ids listed in this column must be present in the delete file optional optional 140 sort_order_id int ID representing sort order for this file [3]. Notes:\nSingle-value serialization for lower and upper bounds is detailed in Appendix D. For float and double, the value -0.0 must precede +0.0, as in the IEEE 754 totalOrder predicate. NaNs are not permitted as lower or upper bounds. If sort order ID is missing or unknown, then the order is assumed to be unsorted. Only data files and equality delete files should be written with a non-null order id. Position deletes are required to be sorted by file and position, not a table order, and should set sort order id to null. Readers must ignore sort order id for position delete files. The following field ids are reserved on data_file: 141. The partition struct stores the tuple of partition values for each file. Its type is derived from the partition fields of the partition spec used to write the manifest file. In v2, the partition struct’s field ids must match the ids from the partition spec.\nThe column metrics maps are used when filtering to select both data and delete files. For delete files, the metrics must store bounds and counts for all deleted rows, or must be omitted. Storing metrics for deleted rows ensures that the values can be used during job planning to find delete files that must be merged during a scan.\nManifest Entry Fields The manifest entry fields are used to keep track of the snapshot in which files were added or logically deleted. The data_file struct is nested inside of the manifest entry so that it can be easily passed to job planning without the manifest entry fields.\nWhen a file is added to the dataset, its manifest entry should store the snapshot ID in which the file was added and set status to 1 (added).\nWhen a file is replaced or deleted from the dataset, its manifest entry fields store the snapshot ID in which the file was deleted and status 2 (deleted). The file may be deleted from the file system when the snapshot in which it was deleted is garbage collected, assuming that older snapshots have also been garbage collected [1].\nIceberg v2 adds data and file sequence numbers to the entry and makes the snapshot ID optional. Values for these fields are inherited from manifest metadata when null. That is, if the field is null for an entry, then the entry must inherit its value from the manifest file’s metadata, stored in the manifest list. The sequence_number field represents the data sequence number and must never change after a file is added to the dataset. The data sequence number represents a relative age of the file content and should be used for planning which delete files apply to a data file. The file_sequence_number field represents the sequence number of the snapshot that added the file and must also remain unchanged upon assigning at commit. The file sequence number can’t be used for pruning delete files as the data within the file may have an older data sequence number. The data and file sequence numbers are inherited only if the entry status is 1 (added). If the entry status is 0 (existing) or 2 (deleted), the entry must include both sequence numbers explicitly.\nNotes:\nTechnically, data files can be deleted when the last snapshot that contains the file as “live” data is garbage collected. But this is harder to detect and requires finding the diff of multiple snapshots. It is easier to track what files are deleted in a snapshot and delete them when that snapshot expires. It is not recommended to add a deleted file back to a table. Adding a deleted file can lead to edge cases where incremental deletes can break table snapshots. Manifest list files are required in v2, so that the sequence_number and snapshot_id to inherit are always available. Sequence Number Inheritance Manifests track the sequence number when a data or delete file was added to the table.\nWhen adding a new file, its data and file sequence numbers are set to null because the snapshot’s sequence number is not assigned until the snapshot is successfully committed. When reading, sequence numbers are inherited by replacing null with the manifest’s sequence number from the manifest list. It is also possible to add a new file with data that logically belongs to an older sequence number. In that case, the data sequence number must be provided explicitly and not inherited. However, the file sequence number must be always assigned when the snapshot is successfully committed.\nWhen writing an existing file to a new manifest or marking an existing file as deleted, the data and file sequence numbers must be non-null and set to the original values that were either inherited or provided at the commit time.\nInheriting sequence numbers through the metadata tree allows writing a new manifest without a known sequence number, so that a manifest can be written once and reused in commit retries. To change a sequence number for a retry, only the manifest list must be rewritten.\nWhen reading v1 manifests with no sequence number column, sequence numbers for all files must default to 0.\nSnapshots A snapshot consists of the following fields:\nv1 v2 Field Description required required snapshot-id A unique long ID optional optional parent-snapshot-id The snapshot ID of the snapshot’s parent. Omitted for any snapshot with no parent required sequence-number A monotonically increasing long that tracks the order of changes to a table required required timestamp-ms A timestamp when the snapshot was created, used for garbage collection and table inspection optional required manifest-list The location of a manifest list for this snapshot that tracks manifest files with additional metadata optional manifests A list of manifest file locations. Must be omitted if manifest-list is present optional required summary A string map that summarizes the snapshot changes, including operation (see below) optional optional schema-id ID of the table’s current schema when the snapshot was created The snapshot summary’s operation field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible operation values are:\nappend – Only data files were added and no files were removed. replace – Data and delete files were added and removed without changing table data; i.e., compaction, changing the data file format, or relocating data files. overwrite – Data and delete files were added and removed in a logical overwrite operation. delete – Data files were removed and their contents logically deleted and/or delete files were added to delete rows. Data and delete files for a snapshot can be stored in more than one manifest. This enables:\nAppends can add a new manifest to minimize the amount of data written, instead of adding new records by rewriting and appending to an existing manifest. (This is called a “fast append”.) Tables can use multiple partition specs. A table’s partition configuration can evolve if, for example, its data volume changes. Each manifest uses a single partition spec, and queries do not need to change because partition filters are derived from data predicates. Large tables can be split across multiple manifests so that implementations can parallelize job planning or reduce the cost of rewriting a manifest. Manifests for a snapshot are tracked by a manifest list.\nValid snapshots are stored as a list in table metadata. For serialization, see Appendix C.\nManifest Lists Snapshots are embedded in table metadata, but the list of manifests for a snapshot are stored in a separate manifest list file.\nA new manifest list is written for each attempt to commit a snapshot because the list of manifests always changes to produce a new snapshot. When a manifest list is written, the (optimistic) sequence number of the snapshot is written for all new manifest files tracked by the list.\nA manifest list includes summary metadata that can be used to avoid scanning all of the manifests in a snapshot when planning a table scan. This includes the number of added, existing, and deleted files, and a summary of values for each field of the partition spec used to write the manifest.\nA manifest list is a valid Iceberg data file: files must use valid Iceberg formats, schemas, and column projection.\nManifest list files store manifest_file, a struct with the following fields:\nv1 v2 Field id, name Type Description required required 500 manifest_path string Location of the manifest file required required 501 manifest_length long Length of the manifest file in bytes required required 502 partition_spec_id int ID of a partition spec used to write the manifest; must be listed in table metadata partition-specs required 517 content int with meaning: 0: data, 1: deletes The type of files tracked by the manifest, either data or delete files; 0 for all v1 manifests required 515 sequence_number long The sequence number when the manifest was added to the table; use 0 when reading v1 manifest lists required 516 min_sequence_number long The minimum data sequence number of all live data or delete files in the manifest; use 0 when reading v1 manifest lists required required 503 added_snapshot_id long ID of the snapshot where the manifest file was added optional required 504 added_files_count int Number of entries in the manifest that have status ADDED (1), when null this is assumed to be non-zero optional required 505 existing_files_count int Number of entries in the manifest that have status EXISTING (0), when null this is assumed to be non-zero optional required 506 deleted_files_count int Number of entries in the manifest that have status DELETED (2), when null this is assumed to be non-zero optional required 512 added_rows_count long Number of rows in all of files in the manifest that have status ADDED, when null this is assumed to be non-zero optional required 513 existing_rows_count long Number of rows in all of files in the manifest that have status EXISTING, when null this is assumed to be non-zero optional required 514 deleted_rows_count long Number of rows in all of files in the manifest that have status DELETED, when null this is assumed to be non-zero optional optional 507 partitions list\u003c508: field_summary\u003e (see below) A list of field summaries for each partition field in the spec. Each field in the list corresponds to a field in the manifest file’s partition spec. optional optional 519 key_metadata binary Implementation-specific key metadata for encryption field_summary is a struct with the following fields:\nv1 v2 Field id, name Type Description required required 509 contains_null boolean Whether the manifest contains at least one partition with a null value for the field optional optional 518 contains_nan boolean Whether the manifest contains at least one partition with a NaN value for the field optional optional 510 lower_bound bytes [1] Lower bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2] optional optional 511 upper_bound bytes [1] Upper bound for the non-null, non-NaN values in the partition field, or null if all values are null or NaN [2] Notes:\nLower and upper bounds are serialized to bytes using the single-object serialization in Appendix D. The type of used to encode the value is the type of the partition field data. If -0.0 is a value of the partition field, the lower_bound must not be +0.0, and if +0.0 is a value of the partition field, the upper_bound must not be -0.0. Scan Planning Scans are planned by reading the manifest files for the current snapshot. Deleted entries in data and delete manifests (those marked with status “DELETED”) are not used in a scan.\nManifests that contain no matching files, determined using either file counts or partition summaries, may be skipped.\nFor each manifest, scan predicates, which filter data rows, are converted to partition predicates, which filter data and delete files. These partition predicates are used to select the data and delete files in the manifest. This conversion uses the partition spec used to write the manifest file.\nScan predicates are converted to partition predicates using an inclusive projection: if a scan predicate matches a row, then the partition predicate must match that row’s partition. This is called inclusive [1] because rows that do not match the scan predicate may be included in the scan by the partition predicate.\nFor example, an events table with a timestamp column named ts that is partitioned by ts_day=day(ts) is queried by users with ranges over the timestamp column: ts \u003e X. The inclusive projection is ts_day \u003e= day(X), which is used to select files that may have matching rows. Note that, in most cases, timestamps just before X will be included in the scan because the file contains rows that match the predicate and rows that do not match the predicate.\nScan predicates are also used to filter data and delete files using column bounds and counts that are stored by field id in manifests. The same filter logic can be used for both data and delete files because both store metrics of the rows either inserted or deleted. If metrics show that a delete file has no rows that match a scan predicate, it may be ignored just as a data file would be ignored [2].\nData files that match the query filter must be read by the scan.\nNote that for any snapshot, all file paths marked with “ADDED” or “EXISTING” may appear at most once across all manifest files in the snapshot. If a file path appears more than once, the results of the scan are undefined. Reader implementations may raise an error in this case, but are not required to do so.\nDelete files that match the query filter must be applied to data files at read time, limited by the scope of the delete file using the following rules.\nA position delete file must be applied to a data file when all of the following are true: The data file’s data sequence number is less than or equal to the delete file’s data sequence number The data file’s partition (both spec and partition values) is equal to the delete file’s partition An equality delete file must be applied to a data file when all of the following are true: The data file’s data sequence number is strictly less than the delete’s data sequence number The data file’s partition (both spec and partition values) is equal to the delete file’s partition or the delete file’s partition spec is unpartitioned In general, deletes are applied only to data files that are older and in the same partition, except for two special cases:\nEquality delete files stored with an unpartitioned spec are applied as global deletes. Otherwise, delete files do not apply to files in other partitions. Position delete files must be applied to data files from the same commit, when the data and delete file data sequence numbers are equal. This allows deleting rows that were added in the same commit. Notes:\nAn alternative, strict projection, creates a partition predicate that will match a file if all of the rows in the file must match the scan predicate. These projections are used to calculate the residual predicates for each file in a scan. For example, if file_a has rows with id between 1 and 10 and a delete file contains rows with id between 1 and 4, a scan for id = 9 may ignore the delete file because none of the deletes can match a row that will be selected. Snapshot Reference Iceberg tables keep track of branches and tags using snapshot references. Tags are labels for individual snapshots. Branches are mutable named references that can be updated by committing a new snapshot as the branch’s referenced snapshot using the Commit Conflict Resolution and Retry procedures.\nThe snapshot reference object records all the information of a reference including snapshot ID, reference type and Snapshot Retention Policy.\nv1 v2 Field name Type Description required required snapshot-id long A reference’s snapshot ID. The tagged snapshot or latest snapshot of a branch. required required type string Type of the reference, tag or branch optional optional min-snapshots-to-keep int For branch type only, a positive number for the minimum number of snapshots to keep in a branch while expiring snapshots. Defaults to table property history.expire.min-snapshots-to-keep. optional optional max-snapshot-age-ms long For branch type only, a positive number for the max age of snapshots to keep when expiring, including the latest snapshot. Defaults to table property history.expire.max-snapshot-age-ms. optional optional max-ref-age-ms long For snapshot references except the main branch, a positive number for the max age of the snapshot reference to keep while expiring snapshots. Defaults to table property history.expire.max-ref-age-ms. The main branch never expires. Valid snapshot references are stored as the values of the refs map in table metadata. For serialization, see Appendix C.\nSnapshot Retention Policy Table snapshots expire and are removed from metadata to allow removed or replaced data files to be physically deleted. The snapshot expiration procedure removes snapshots from table metadata and applies the table’s retention policy. Retention policy can be configured both globally and on snapshot reference through properties min-snapshots-to-keep, max-snapshot-age-ms and max-ref-age-ms.\nWhen expiring snapshots, retention policies in table and snapshot references are evaluated in the following way:\nStart with an empty set of snapshots to retain Remove any refs (other than main) where the referenced snapshot is older than max-ref-age-ms For each branch and tag, add the referenced snapshot to the retained set For each branch, add its ancestors to the retained set until: The snapshot is older than max-snapshot-age-ms, AND The snapshot is not one of the first min-snapshots-to-keep in the branch (including the branch’s referenced snapshot) Expire any snapshot not in the set of snapshots to retain. Table Metadata Table metadata is stored as JSON. Each table metadata change creates a new table metadata file that is committed by an atomic operation. This operation is used to ensure that a new version of table metadata replaces the version on which it was based. This produces a linear history of table versions and ensures that concurrent writes are not lost.\nThe atomic operation used to commit metadata depends on how tables are tracked and is not standardized by this spec. See the sections below for examples.\nTable Metadata Fields Table metadata consists of the following fields:\nv1 v2 Field Description required required format-version An integer version number for the format. Currently, this can be 1 or 2 based on the spec. Implementations must throw an exception if a table’s version is higher than the supported version. optional required table-uuid A UUID that identifies the table, generated when the table is created. Implementations must throw an exception if a table’s UUID does not match the expected UUID after refreshing metadata. required required location The table’s base location. This is used by writers to determine where to store data files, manifest files, and table metadata files. required last-sequence-number The table’s highest assigned sequence number, a monotonically increasing long that tracks the order of snapshots in a table. required required last-updated-ms Timestamp in milliseconds from the unix epoch when the table was last updated. Each table metadata file should update this field just before writing. required required last-column-id An integer; the highest assigned column ID for the table. This is used to ensure columns are always assigned an unused ID when evolving schemas. required schema The table’s current schema. (Deprecated: use schemas and current-schema-id instead) optional required schemas A list of schemas, stored as objects with schema-id. optional required current-schema-id ID of the table’s current schema. required partition-spec The table’s current partition spec, stored as only fields. Note that this is used by writers to partition data, but is not used when reading because reads use the specs stored in manifest files. (Deprecated: use partition-specs and default-spec-id instead) optional required partition-specs A list of partition specs, stored as full partition spec objects. optional required default-spec-id ID of the “current” spec that writers should use by default. optional required last-partition-id An integer; the highest assigned partition field ID across all partition specs for the table. This is used to ensure partition fields are always assigned an unused ID when evolving specs. optional optional properties A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, commit.retry.num-retries is used to control the number of commit retries. optional optional current-snapshot-id long ID of the current table snapshot; must be the same as the current ID of the main branch in refs. optional optional snapshots A list of valid snapshots. Valid snapshots are snapshots for which all data files exist in the file system. A data file must not be deleted from the file system until the last snapshot in which it was listed is garbage collected. optional optional snapshot-log A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the table. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed. optional optional metadata-log A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. optional required sort-orders A list of sort orders, stored as full sort order objects. optional required default-sort-order-id Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. optional refs A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a main branch reference pointing to the current-snapshot-id even if the refs map is null. optional optional statistics A list (optional) of table statistics. For serialization details, see Appendix C.\nTable statistics Table statistics files are valid Puffin files. Statistics are informational. A reader can choose to ignore statistics information. Statistics support is not required to read the table correctly. A table can contain many statistics files associated with different table snapshots.\nStatistics files metadata within statistics table metadata field is a struct with the following fields:\nv1 v2 Field name Type Description required required snapshot-id string ID of the Iceberg table’s snapshot the statistics file is associated with. required required statistics-path string Path of the statistics file. See Puffin file format. required required file-size-in-bytes long Size of the statistics file. required required file-footer-size-in-bytes long Total size of the statistics file’s footer (not the footer payload size). See Puffin file format for footer definition. optional optional key-metadata Base64-encoded implementation-specific key metadata for encryption. required required blob-metadata list\u003cblob metadata\u003e (see below) A list of the blob metadata for statistics contained in the file with structure described below. Blob metadata is a struct with the following fields:\nv1 v2 Field name Type Description required required type string Type of the blob. Matches Blob type in the Puffin file. required required snapshot-id long ID of the Iceberg table’s snapshot the blob was computed from. required required sequence-number long Sequence number of the Iceberg table’s snapshot the blob was computed from. required required fields list\u003cinteger\u003e Ordered list of fields, given by field ID, on which the statistic was calculated. optional optional properties map\u003cstring, string\u003e Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. Commit Conflict Resolution and Retry When two commits happen at the same time and are based on the same version, only one commit will succeed. In most cases, the failed commit can be applied to the new current version of table metadata and retried. Updates verify the conditions under which they can be applied to a new version and retry if those conditions are met.\nAppend operations have no requirements and can always be applied. Replace operations must verify that the files that will be deleted are still in the table. Examples of replace operations include format changes (replace an Avro file with a Parquet file) and compactions (several files are replaced with a single file that contains the same rows). Delete operations must verify that specific files to delete are still in the table. Delete operations based on expressions can always be applied (e.g., where timestamp \u003c X). Table schema updates and partition spec changes must validate that the schema has not changed between the base version and the current version. File System Tables An atomic swap can be implemented using atomic rename in file systems that support it, like HDFS or most local file systems [1].\nEach version of table metadata is stored in a metadata folder under the table’s base location using a file naming scheme that includes a version number, V: v\u003cV\u003e.metadata.json. To commit a new metadata version, V+1, the writer performs the following steps:\nRead the current table metadata version V. Create new table metadata based on version V. Write the new table metadata to a unique file: \u003crandom-uuid\u003e.metadata.json. Rename the unique file to the well-known file for version V: v\u003cV+1\u003e.metadata.json. If the rename succeeds, the commit succeeded and V+1 is the table’s current version If the rename fails, go back to step 1. Notes:\nThe file system table scheme is implemented in HadoopTableOperations. Metastore Tables The atomic swap needed to commit new versions of table metadata can be implemented by storing a pointer in a metastore or database that is updated with a check-and-put operation [1]. The check-and-put validates that the version of the table that a write is based on is still current and then makes the new metadata from the write the current version.\nEach version of table metadata is stored in a metadata folder under the table’s base location using a naming scheme that includes a version and UUID: \u003cV\u003e-\u003crandom-uuid\u003e.metadata.json. To commit a new metadata version, V+1, the writer performs the following steps:\nCreate a new table metadata file based on the current metadata. Write the new table metadata to a unique file: \u003cV+1\u003e-\u003crandom-uuid\u003e.metadata.json. Request that the metastore swap the table’s metadata pointer from the location of V to the location of V+1. If the swap succeeds, the commit succeeded. V was still the latest metadata version and the metadata file for V+1 is now the current metadata. If the swap fails, another writer has already created V+1. The current writer goes back to step 1. Notes:\nThe metastore table scheme is partly implemented in BaseMetastoreTableOperations. Delete Formats This section details how to encode row-level deletes in Iceberg delete files. Row-level deletes are not supported in v1.\nRow-level delete files are valid Iceberg data files: files must use valid Iceberg formats, schemas, and column projection. It is recommended that delete files are written using the table’s default file format.\nRow-level delete files are tracked by manifests, like data files. A separate set of manifests is used for delete files, but the manifest schemas are identical.\nBoth position and equality deletes allow encoding deleted row values with a delete. This can be used to reconstruct a stream of changes to a table.\nPosition Delete Files Position-based delete files identify deleted rows by file and position in one or more data files, and may optionally contain the deleted row.\nA data row is deleted if there is an entry in a position delete file for the row’s file and position in the data file, starting at 0.\nPosition-based delete files store file_position_delete, a struct with the following fields:\nField id, name Type Description 2147483546 file_path string Full URI of a data file with FS scheme. This must match the file_path of the target data file in a manifest entry 2147483545 pos long Ordinal position of a deleted row in the target data file identified by file_path, starting at 0 2147483544 row required struct\u003c...\u003e [1] Deleted row values. Omit the column when not storing deleted rows. When present in the delete file, row is required because all delete entries must include the row values. When the deleted row column is present, its schema may be any subset of the table schema and must use field ids matching the table.\nTo ensure the accuracy of statistics, all delete entries must include row values, or the column must be omitted (this is why the column type is required).\nThe rows in the delete file must be sorted by file_path then pos to optimize filtering rows while scanning.\nSorting by file_path allows filter pushdown by file in columnar storage formats. Sorting by pos allows filtering rows while scanning, to avoid keeping deletes in memory. Equality Delete Files Equality delete files identify deleted rows in a collection of data files by one or more column values, and may optionally contain additional columns of the deleted row.\nEquality delete files store any subset of a table’s columns and use the table’s field ids. The delete columns are the columns of the delete file used to match data rows. Delete columns are identified by id in the delete file metadata column equality_ids. Float and double columns cannot be used as delete columns in equality delete files.\nA data row is deleted if its values are equal to all delete columns for any row in an equality delete file that applies to the row’s data file (see Scan Planning).\nEach row of the delete file produces one equality predicate that matches any row where the delete columns are equal. Multiple columns can be thought of as an AND of equality predicates. A null value in a delete column matches a row if the row’s value is null, equivalent to col IS NULL.\nFor example, a table with the following data:\n1: id | 2: category | 3: name -------|-------------|--------- 1 | marsupial | Koala 2 | toy | Teddy 3 | NULL | Grizzly 4 | NULL | Polar The delete id = 3 could be written as either of the following equality delete files:\nequality_ids=[1] 1: id ------- 3 equality_ids=[1] 1: id | 2: category | 3: name -------|-------------|--------- 3 | NULL | Grizzly The delete id = 4 AND category IS NULL could be written as the following equality delete file:\nequality_ids=[1, 2] 1: id | 2: category | 3: name -------|-------------|--------- 4 | NULL | Polar If a delete column in an equality delete file is later dropped from the table, it must still be used when applying the equality deletes. If a column was added to a table and later used as a delete column in an equality delete file, the column value is read for older data files using normal projection rules (defaults to null).\nDelete File Stats Manifests hold the same statistics for delete files and data files. For delete files, the metrics describe the values that were deleted.\nAppendix A: Format-specific Requirements Avro Data Type Mappings\nValues should be stored in Avro using the Avro types and logical type annotations in the table below.\nOptional fields, array elements, and map values must be wrapped in an Avro union with null. This is the only union type allowed in Iceberg data files.\nOptional fields must always set the Avro field default value to null.\nMaps with non-string keys must use an array representation with the map logical type. The array representation or Avro’s map type may be used for maps with string keys.\nType Avro type Notes boolean boolean int int long long float float double double decimal(P,S) { \"type\": \"fixed\",\n\"size\": minBytesRequired(P),\n\"logicalType\": \"decimal\",\n\"precision\": P,\n\"scale\": S } Stored as fixed using the minimum number of bytes for the given precision. date { \"type\": \"int\",\n\"logicalType\": \"date\" } Stores days from the 1970-01-01. time { \"type\": \"long\",\n\"logicalType\": \"time-micros\" } Stores microseconds from midnight. timestamp { \"type\": \"long\",\n\"logicalType\": \"timestamp-micros\",\n\"adjust-to-utc\": false } Stores microseconds from 1970-01-01 00:00:00.000000. timestamptz { \"type\": \"long\",\n\"logicalType\": \"timestamp-micros\",\n\"adjust-to-utc\": true } Stores microseconds from 1970-01-01 00:00:00.000000 UTC. string string uuid { \"type\": \"fixed\",\n\"size\": 16,\n\"logicalType\": \"uuid\" } fixed(L) { \"type\": \"fixed\",\n\"size\": L } binary bytes struct record list array map array of key-value records, or map when keys are strings (optional). Array storage must use logical type name map and must store elements that are 2-field records. The first field is a non-null key and the second field is the value. Field IDs\nIceberg struct, list, and map types identify nested types by ID. When writing data to Avro files, these IDs must be stored in the Avro schema to support ID-based column pruning.\nIDs are stored as JSON integers in the following locations:\nID Avro schema location Property Example Struct field Record field object field-id { \"type\": \"record\", ...\n\"fields\": [\n{ \"name\": \"l\",\n\"type\": [\"null\", \"long\"],\n\"default\": null,\n\"field-id\": 8 }\n] } List element Array schema object element-id { \"type\": \"array\",\n\"items\": \"int\",\n\"element-id\": 9 } String map key Map schema object key-id { \"type\": \"map\",\n\"values\": \"int\",\n\"key-id\": 10,\n\"value-id\": 11 } String map value Map schema object value-id Map key, value Key, value fields in the element record. field-id { \"type\": \"array\",\n\"logicalType\": \"map\",\n\"items\": {\n\"type\": \"record\",\n\"name\": \"k12_v13\",\n\"fields\": [\n{ \"name\": \"key\",\n\"type\": \"int\",\n\"field-id\": 12 },\n{ \"name\": \"value\",\n\"type\": \"string\",\n\"field-id\": 13 }\n] } } Note that the string map case is for maps where the key type is a string. Using Avro’s map type in this case is optional. Maps with string keys may be stored as arrays.\nParquet Data Type Mappings\nValues should be stored in Parquet using the types and logical type annotations in the table below. Column IDs are required.\nLists must use the 3-level representation.\nType Parquet physical type Logical type Notes boolean boolean int int long long float float double double decimal(P,S) P \u003c= 9: int32,\nP \u003c= 18: int64,\nfixed otherwise DECIMAL(P,S) Fixed must use the minimum number of bytes that can store P. date int32 DATE Stores days from the 1970-01-01. time int64 TIME_MICROS with adjustToUtc=false Stores microseconds from midnight. timestamp int64 TIMESTAMP_MICROS with adjustToUtc=false Stores microseconds from 1970-01-01 00:00:00.000000. timestamptz int64 TIMESTAMP_MICROS with adjustToUtc=true Stores microseconds from 1970-01-01 00:00:00.000000 UTC. string binary UTF8 Encoding must be UTF-8. uuid fixed_len_byte_array[16] UUID fixed(L) fixed_len_byte_array[L] binary binary struct group list 3-level list LIST See Parquet docs for 3-level representation. map 3-level map MAP See Parquet docs for 3-level representation. ORC Data Type Mappings\nType ORC type ORC type attributes Notes boolean boolean int int ORC tinyint and smallint would also map to int. long long float float double double decimal(P,S) decimal date date time long iceberg.long-type=TIME Stores microseconds from midnight. timestamp timestamp [1] timestamptz timestamp_instant [1] string string ORC varchar and char would also map to string. uuid binary iceberg.binary-type=UUID fixed(L) binary iceberg.binary-type=FIXED \u0026 iceberg.length=L The length would not be checked by the ORC reader and should be checked by the adapter. binary binary struct struct list array map map Notes:\nORC’s TimestampColumnVector consists of a time field (milliseconds since epoch) and a nanos field (nanoseconds within the second). Hence the milliseconds within the second are reported twice; once in the time field and again in the nanos field. The read adapter should only use milliseconds within the second from one of these fields. The write adapter should also report milliseconds within the second twice; once in the time field and again in the nanos field. ORC writer is expected to correctly consider millis information from one of the fields. More details at https://issues.apache.org/jira/browse/ORC-546 One of the interesting challenges with this is how to map Iceberg’s schema evolution (id based) on to ORC’s (name based). In theory, we could use Iceberg’s column ids as the column and field names, but that would be inconvenient.\nThe column IDs must be stored in ORC type attributes using the key iceberg.id, and iceberg.required to store \"true\" if the Iceberg column is required, otherwise it will be optional.\nIceberg would build the desired reader schema with their schema evolution rules and pass that down to the ORC reader, which would then use its schema evolution to map that to the writer’s schema. Basically, Iceberg would need to change the names of columns and fields to get the desired mapping.\nIceberg writer ORC writer Iceberg reader ORC reader struct\u003ca (1): int, b (2): string\u003e struct\u003ca: int, b: string\u003e struct\u003ca (2): string, c (3): date\u003e struct\u003cb: string, c: date\u003e struct\u003ca (1): struct\u003cb (2): string, c (3): date\u003e\u003e struct\u003ca: struct\u003cb:string, c:date\u003e\u003e struct\u003caa (1): struct\u003ccc (3): date, bb (2): string\u003e\u003e struct\u003ca: struct\u003cc:date, b:string\u003e\u003e Appendix B: 32-bit Hash Requirements The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with 0.\nPrimitive type Hash specification Test value int hashLong(long(v))\t[1] 34 → 2017239379 long hashBytes(littleEndianBytes(v)) 34L → 2017239379 decimal(P,S) hashBytes(minBigEndian(unscaled(v)))[2] 14.20 → -500754589 date hashInt(daysFromUnixEpoch(v)) 2017-11-16 → -653330422 time hashLong(microsecsFromMidnight(v)) 22:31:08 → -662762989 timestamp hashLong(microsecsFromUnixEpoch(v)) 2017-11-16T22:31:08 → -2047944441 timestamptz hashLong(microsecsFromUnixEpoch(v)) 2017-11-16T14:31:08-08:00→ -2047944441 string hashBytes(utf8Bytes(v)) iceberg → 1210000089 uuid hashBytes(uuidBytes(v))\t[3] f79c3e09-677c-4bbd-a479-3f349cb785e7 → 1488055340 fixed(L) hashBytes(v) 00 01 02 03 → -188683207 binary hashBytes(v) 00 01 02 03 → -188683207 The types below are not currently valid for bucketing, and so are not hashed. However, if that changes and a hash value is needed, the following table shall apply:\nPrimitive type Hash specification Test value boolean false: hashInt(0), true: hashInt(1) true → 1392991556 float hashLong(doubleToLongBits(double(v)) [4] 1.0F → -142385009, 0.0F → 1669671676, -0.0F → 1669671676 double hashLong(doubleToLongBits(v)) [4] 1.0D → -142385009, 0.0D → 1669671676, -0.0D → 1669671676 Notes:\nInteger and long hash results must be identical for all integer values. This ensures that schema evolution does not change bucket partition values if integer types are promoted. Decimal values are hashed using the minimum number of bytes required to hold the unscaled value as a two’s complement big-endian; this representation does not include padding bytes required for storage in a fixed-length array. Hash results are not dependent on decimal scale, which is part of the type, not the data value. UUIDs are encoded using big endian. The test UUID for the example above is: f79c3e09-677c-4bbd-a479-3f349cb785e7. This UUID encoded as a byte array is: F7 9C 3E 09 67 7C 4B BD A4 79 3F 34 9C B7 85 E7 doubleToLongBits must give the IEEE 754 compliant bit representation of the double value. All NaN bit patterns must be canonicalized to 0x7ff8000000000000L. Negative zero (-0.0) must be canonicalized to positive zero (0.0). Float hash values are the result of hashing the float cast to double to ensure that schema evolution does not change hash values if float types are promoted. Appendix C: JSON serialization Schemas Schemas are serialized as a JSON object with the same fields as a struct in the table below, and the following additional fields:\nv1 v2 Field JSON representation Example optional required schema-id JSON int 0 optional optional identifier-field-ids JSON list of ints [1, 2] Types are serialized according to this table:\nType JSON representation Example boolean JSON string: \"boolean\" \"boolean\" int JSON string: \"int\" \"int\" long JSON string: \"long\" \"long\" float JSON string: \"float\" \"float\" double JSON string: \"double\" \"double\" date JSON string: \"date\" \"date\" time JSON string: \"time\" \"time\" timestamp without zone JSON string: \"timestamp\" \"timestamp\" timestamp with zone JSON string: \"timestamptz\" \"timestamptz\" string JSON string: \"string\" \"string\" uuid JSON string: \"uuid\" \"uuid\" fixed(L) JSON string: \"fixed[\u003cL\u003e]\" \"fixed[16]\" binary JSON string: \"binary\" \"binary\" decimal(P, S) JSON string: \"decimal(\u003cP\u003e,\u003cS\u003e)\" \"decimal(9,2)\",\n\"decimal(9, 2)\" struct JSON object: {\n\"type\": \"struct\",\n\"fields\": [ {\n\"id\": \u003cfield id int\u003e,\n\"name\": \u003cname string\u003e,\n\"required\": \u003cboolean\u003e,\n\"type\": \u003ctype JSON\u003e,\n\"doc\": \u003ccomment string\u003e,\n\"initial-default\": \u003cJSON encoding of default value\u003e,\n\"write-default\": \u003cJSON encoding of default value\u003e\n}, ...\n] } {\n\"type\": \"struct\",\n\"fields\": [ {\n\"id\": 1,\n\"name\": \"id\",\n\"required\": true,\n\"type\": \"uuid\",\n\"initial-default\": \"0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb\",\n\"write-default\": \"ec5911be-b0a7-458c-8438-c9a3e53cffae\"\n}, {\n\"id\": 2,\n\"name\": \"data\",\n\"required\": false,\n\"type\": {\n\"type\": \"list\",\n...\n}\n} ]\n} list JSON object: {\n\"type\": \"list\",\n\"element-id\": \u003cid int\u003e,\n\"element-required\": \u003cbool\u003e\n\"element\": \u003ctype JSON\u003e\n} {\n\"type\": \"list\",\n\"element-id\": 3,\n\"element-required\": true,\n\"element\": \"string\"\n} map JSON object: {\n\"type\": \"map\",\n\"key-id\": \u003ckey id int\u003e,\n\"key\": \u003ctype JSON\u003e,\n\"value-id\": \u003cval id int\u003e,\n\"value-required\": \u003cbool\u003e\n\"value\": \u003ctype JSON\u003e\n} {\n\"type\": \"map\",\n\"key-id\": 4,\n\"key\": \"string\",\n\"value-id\": 5,\n\"value-required\": false,\n\"value\": \"double\"\n} Note that default values are serialized using the JSON single-value serialization in Appendix D.\nPartition Specs Partition specs are serialized as a JSON object with the following fields:\nField JSON representation Example spec-id JSON int 0 fields JSON list: [\n\u003cpartition field JSON\u003e,\n...\n] [ {\n\"source-id\": 4,\n\"field-id\": 1000,\n\"name\": \"ts_day\",\n\"transform\": \"day\"\n}, {\n\"source-id\": 1,\n\"field-id\": 1001,\n\"name\": \"id_bucket\",\n\"transform\": \"bucket[16]\"\n} ] Each partition field in the fields list is stored as an object. See the table for more detail:\nTransform or Field JSON representation Example identity JSON string: \"identity\" \"identity\" bucket[N] JSON string: \"bucket[\u003cN\u003e]\" \"bucket[16]\" truncate[W] JSON string: \"truncate[\u003cW\u003e]\" \"truncate[20]\" year JSON string: \"year\" \"year\" month JSON string: \"month\" \"month\" day JSON string: \"day\" \"day\" hour JSON string: \"hour\" \"hour\" Partition Field JSON object: {\n\"source-id\": \u003cid int\u003e,\n\"field-id\": \u003cfield id int\u003e,\n\"name\": \u003cname string\u003e,\n\"transform\": \u003ctransform JSON\u003e\n} {\n\"source-id\": 1,\n\"field-id\": 1000,\n\"name\": \"id_bucket\",\n\"transform\": \"bucket[16]\"\n} In some cases partition specs are stored using only the field list instead of the object format that includes the spec ID, like the deprecated partition-spec field in table metadata. The object format should be used unless otherwise noted in this spec.\nThe field-id property was added for each partition field in v2. In v1, the reference implementation assigned field ids sequentially in each spec starting at 1,000. See Partition Evolution for more details.\nSort Orders Sort orders are serialized as a list of JSON object, each of which contains the following fields:\nField JSON representation Example order-id JSON int 1 fields JSON list: [\n\u003csort field JSON\u003e,\n...\n] [ {\n\"transform\": \"identity\",\n\"source-id\": 2,\n\"direction\": \"asc\",\n\"null-order\": \"nulls-first\"\n}, {\n\"transform\": \"bucket[4]\",\n\"source-id\": 3,\n\"direction\": \"desc\",\n\"null-order\": \"nulls-last\"\n} ] Each sort field in the fields list is stored as an object with the following properties:\nField JSON representation Example Sort Field JSON object: {\n\"transform\": \u003ctransform JSON\u003e,\n\"source-id\": \u003csource id int\u003e,\n\"direction\": \u003cdirection string\u003e,\n\"null-order\": \u003cnull-order string\u003e\n} {\n\"transform\": \"bucket[4]\",\n\"source-id\": 3,\n\"direction\": \"desc\",\n\"null-order\": \"nulls-last\"\n} The following table describes the possible values for the some of the field within sort field:\nField JSON representation Possible values direction JSON string \"asc\", \"desc\" null-order JSON string \"nulls-first\", \"nulls-last\" Table Metadata and Snapshots Table metadata is serialized as a JSON object according to the following table. Snapshots are not serialized separately. Instead, they are stored in the table metadata JSON.\nMetadata field JSON representation Example format-version JSON int 1 table-uuid JSON string \"fb072c92-a02b-11e9-ae9c-1bb7bc9eca94\" location JSON string \"s3://b/wh/data.db/table\" last-updated-ms JSON long 1515100955770 last-column-id JSON int 22 schema JSON schema (object) See above, read schemas instead schemas JSON schemas (list of objects) See above current-schema-id JSON int 0 partition-spec JSON partition fields (list) See above, read partition-specs instead partition-specs JSON partition specs (list of objects) See above default-spec-id JSON int 0 last-partition-id JSON int 1000 properties JSON object: {\n\"\u003ckey\u003e\": \"\u003cval\u003e\",\n...\n} {\n\"write.format.default\": \"avro\",\n\"commit.retry.num-retries\": \"4\"\n} current-snapshot-id JSON long 3051729675574597004 snapshots JSON list of objects: [ {\n\"snapshot-id\": \u003cid\u003e,\n\"timestamp-ms\": \u003ctimestamp-in-ms\u003e,\n\"summary\": {\n\"operation\": \u003coperation\u003e,\n... },\n\"manifest-list\": \"\u003clocation\u003e\",\n\"schema-id\": \"\u003cid\u003e\"\n},\n...\n] [ {\n\"snapshot-id\": 3051729675574597004,\n\"timestamp-ms\": 1515100955770,\n\"summary\": {\n\"operation\": \"append\"\n},\n\"manifest-list\": \"s3://b/wh/.../s1.avro\"\n\"schema-id\": 0\n} ] snapshot-log JSON list of objects: [\n{\n\"snapshot-id\": ,\n\"timestamp-ms\": },\n...\n] [ {\n\"snapshot-id\": 30517296...,\n\"timestamp-ms\": 1515100...\n} ] metadata-log JSON list of objects: [\n{\n\"metadata-file\": ,\n\"timestamp-ms\": },\n...\n] [ {\n\"metadata-file\": \"s3://bucket/.../v1.json\",\n\"timestamp-ms\": 1515100...\n} ] sort-orders JSON sort orders (list of sort field object) See above default-sort-order-id JSON int 0 refs JSON map with string key and object value:\n{\n\"\u003cname\u003e\": {\n\"snapshot-id\": \u003cid\u003e,\n\"type\": \u003ctype\u003e,\n\"max-ref-age-ms\": \u003clong\u003e,\n...\n}\n...\n} {\n\"test\": {\n\"snapshot-id\": 123456789000,\n\"type\": \"tag\",\n\"max-ref-age-ms\": 10000000\n}\n} Name Mapping Serialization Name mapping is serialized as a list of field mapping JSON Objects which are serialized as follows\nField mapping field JSON representation Example names JSON list of strings [\"latitude\", \"lat\"] field_id JSON int 1 fields JSON field mappings (list of objects) [{ \"field-id\": 4,\n\"names\": [\"latitude\", \"lat\"]\n}, {\n\"field-id\": 5,\n\"names\": [\"longitude\", \"long\"]\n}] Example\n[ { \"field-id\": 1, \"names\": [\"id\", \"record_id\"] }, { \"field-id\": 2, \"names\": [\"data\"] }, { \"field-id\": 3, \"names\": [\"location\"], \"fields\": [ { \"field-id\": 4, \"names\": [\"latitude\", \"lat\"] }, { \"field-id\": 5, \"names\": [\"longitude\", \"long\"] } ] } ] Content File (Data and Delete) Serialization Content file (data or delete) is serialized as a JSON object according to the following table.\nMetadata field JSON representation Example spec-id JSON int 1 content JSON string DATA, POSITION_DELETES, EQUALITY_DELETES file-path JSON string \"s3://b/wh/data.db/table\" file-format JSON string AVRO, ORC, PARQUET partition JSON object: Partition data tuple using partition field ids for the struct field ids {\"1000\":1} record-count JSON long 1 file-size-in-bytes JSON long 1024 column-sizes JSON object: Map from column id to the total size on disk of all regions that store the column. {\"keys\":[3,4],\"values\":[100,200]} value-counts JSON object: Map from column id to number of values in the column (including null and NaN values) {\"keys\":[3,4],\"values\":[90,180]} null-value-counts JSON object: Map from column id to number of null values in the column {\"keys\":[3,4],\"values\":[10,20]} nan-value-counts JSON object: Map from column id to number of NaN values in the column {\"keys\":[3,4],\"values\":[0,0]} lower-bounds JSON object: Map from column id to lower bound binary in the column serialized as hexadecimal string {\"keys\":[3,4],\"values\":[\"01000000\",\"02000000\"]} upper-bounds JSON object: Map from column id to upper bound binary in the column serialized as hexadecimal string {\"keys\":[3,4],\"values\":[\"05000000\",\"0A000000\"]} key-metadata JSON string: Encryption key metadata binary serialized as hexadecimal string 00000000000000000000000000000000 split-offsets JSON list of long: Split offsets for the data file [128,256] equality-ids JSON list of int: Field ids used to determine row equality in equality delete files [1] sort-order-id JSON int 1 File Scan Task Serialization File scan task is serialized as a JSON object according to the following table.\nMetadata field JSON representation Example schema JSON object See above, read schemas instead spec JSON object See above, read partition specs instead data-file JSON object See above, read content file instead delete-files JSON list of objects See above, read content file instead residual-filter JSON object: residual filter expression {\"type\":\"eq\",\"term\":\"id\",\"value\":1} Appendix D: Single-value serialization Binary single-value serialization This serialization scheme is for storing single values as individual binary values in the lower and upper bounds maps of manifest files.\nType Binary serialization boolean 0x00 for false, non-zero byte for true int Stored as 4-byte little-endian long Stored as 8-byte little-endian float Stored as 4-byte little-endian double Stored as 8-byte little-endian date Stores days from the 1970-01-01 in an 4-byte little-endian int time Stores microseconds from midnight in an 8-byte little-endian long timestamp without zone Stores microseconds from 1970-01-01 00:00:00.000000 in an 8-byte little-endian long timestamp with zone Stores microseconds from 1970-01-01 00:00:00.000000 UTC in an 8-byte little-endian long string UTF-8 bytes (without length) uuid 16-byte big-endian value, see example in Appendix B fixed(L) Binary value binary Binary value (without length) decimal(P, S) Stores unscaled value as two’s-complement big-endian binary, using the minimum number of bytes for the value struct Not supported list Not supported map Not supported JSON single-value serialization Single values are serialized as JSON by type according to the following table:\nType JSON representation Example Description boolean JSON boolean true int JSON int 34 long JSON long 34 float JSON number 1.0 double JSON number 1.0 decimal(P,S) JSON string \"14.20\", \"2E+20\" Stores the string representation of the decimal value, specifically, for values with a positive scale, the number of digits to the right of the decimal point is used to indicate scale, for values with a negative scale, the scientific notation is used and the exponent must equal the negated scale date JSON string \"2017-11-16\" Stores ISO-8601 standard date time JSON string \"22:31:08.123456\" Stores ISO-8601 standard time with microsecond precision timestamp JSON string \"2017-11-16T22:31:08.123456\" Stores ISO-8601 standard timestamp with microsecond precision; must not include a zone offset timestamptz JSON string \"2017-11-16T22:31:08.123456+00:00\" Stores ISO-8601 standard timestamp with microsecond precision; must include a zone offset and it must be ‘+00:00’ string JSON string \"iceberg\" uuid JSON string \"f79c3e09-677c-4bbd-a479-3f349cb785e7\" Stores the lowercase uuid string fixed(L) JSON string \"000102ff\" Stored as a hexadecimal string binary JSON string \"000102ff\" Stored as a hexadecimal string struct JSON object by field ID {\"1\": 1, \"2\": \"bar\"} Stores struct fields using the field ID as the JSON field name; field values are stored using this JSON single-value format list JSON array of values [1, 2, 3] Stores a JSON array of values that are serialized using this JSON single-value format map JSON object of key and value arrays { \"keys\": [\"a\", \"b\"], \"values\": [1, 2] } Stores arrays of keys and values; individual keys and values are serialized using this JSON single-value format Appendix E: Format version changes Version 3 Default values are added to struct fields in v3.\nThe write-default is a forward-compatible change because it is only used at write time. Old writers will fail because the field is missing. Tables with initial-default will be read correctly by older readers if initial-default is always null for optional fields. Otherwise, old readers will default optional columns with null. Old readers will fail to read required fields which are populated by initial-default because that default is not supported. Version 2 Writing v1 metadata:\nTable metadata field last-sequence-number should not be written Snapshot field sequence-number should not be written Manifest list field sequence-number should not be written Manifest list field min-sequence-number should not be written Manifest list field content must be 0 (data) or omitted Manifest entry field sequence_number should not be written Manifest entry field file_sequence_number should not be written Data file field content must be 0 (data) or omitted Reading v1 metadata for v2:\nTable metadata field last-sequence-number must default to 0 Snapshot field sequence-number must default to 0 Manifest list field sequence-number must default to 0 Manifest list field min-sequence-number must default to 0 Manifest list field content must default to 0 (data) Manifest entry field sequence_number must default to 0 Manifest entry field file_sequence_number must default to 0 Data file field content must default to 0 (data) Writing v2 metadata:\nTable metadata JSON: last-sequence-number was added and is required; default to 0 when reading v1 metadata table-uuid is now required current-schema-id is now required schemas is now required partition-specs is now required default-spec-id is now required last-partition-id is now required sort-orders is now required default-sort-order-id is now required schema is no longer required and should be omitted; use schemas and current-schema-id instead partition-spec is no longer required and should be omitted; use partition-specs and default-spec-id instead Snapshot JSON: sequence-number was added and is required; default to 0 when reading v1 metadata manifest-list is now required manifests is no longer required and should be omitted; always use manifest-list instead Manifest list manifest_file: content was added and is required; 0=data, 1=deletes; default to 0 when reading v1 manifest lists sequence_number was added and is required min_sequence_number was added and is required added_files_count is now required existing_files_count is now required deleted_files_count is now required added_rows_count is now required existing_rows_count is now required deleted_rows_count is now required Manifest key-value metadata: schema-id is now required partition-spec-id is now required format-version is now required content was added and is required (must be “data” or “deletes”) Manifest manifest_entry: snapshot_id is now optional to support inheritance sequence_number was added and is optional, to support inheritance file_sequence_number was added and is optional, to support inheritance Manifest data_file: content was added and is required; 0=data, 1=position deletes, 2=equality deletes; default to 0 when reading v1 manifests equality_ids was added, to be used for equality deletes only block_size_in_bytes was removed (breaks v1 reader compatibility) file_ordinal was removed sort_columns was removed Note that these requirements apply when writing data to a v2 table. Tables that are upgraded from v1 may contain metadata that does not follow these requirements. Implementations should remain backward-compatible with v1 metadata requirements.\n","description":"","title":"Spec","uri":"/spec/"},{"categories":null,"content":" Iceberg Talks Here is a list of talks and other videos related to Iceberg.\nEliminating Shuffles in DELETE, UPDATE, MERGE Date: July 27, 2023, Authors: Anton Okolnychyi, Chao Sun\nWrite Distribution Modes in Apache Iceberg Date: March 15, 2023, Author: Russell Spitzer\nTechnical Evolution of Apache Iceberg Date: March 15, 2023, Author: Anton Okolnychyi\nIceberg’s Best Secret Exploring Metadata Tables Date: January 12, 2023, Author: Szehon Ho\nData architecture in 2022 Date: May 5, 2022, Authors: Ryan Blue\nWhy You Shouldn’t Care About Iceberg | Tabular Date: March 24, 2022, Authors: Ryan Blue\nManaging Data Files in Apache Iceberg Date: March 2, 2022, Author: Russell Spitzer\nTuning Row-Level Operations in Apache Iceberg Date: March 2, 2022, Author: Anton Okolnychyi\nMulti Dimensional Clustering with Z Ordering Date: December 6, 2021, Author: Russell Spitzer\nExpert Roundtable: The Future of Metadata After Hive Metastore Date: November 15, 2021, Authors: Lior Ebel, Seshu Adunuthula, Ryan Blue \u0026 Oz Katz\nPresto and Apache Iceberg: Building out Modern Open Data Lakes Date: November 10, 2021, Authors: Daniel Weeks, Chunxu Tang\nIceberg Case Studies Date: September 29, 2021, Authors: Ryan Blue\nDeep Dive into Iceberg SQL Extensions Date: July 13, 2021, Author: Anton Okolnychyi\nBuilding efficient and reliable data lakes with Apache Iceberg Date: October 21, 2020, Authors: Anton Okolnychyi, Vishwa Lakkundi\nSpark and Iceberg at Apple’s Scale - Leveraging differential files for efficient upserts and deletes Date: October 21, 2020, Authors: Anton Okolnychyi, Vishwa Lakkundi\nApache Iceberg - A Table Format for Huge Analytic Datasets Date: October 21, 2020, Author: Ryan Blue\n","description":"","title":"Talks","uri":"/talks/"},{"categories":null,"content":" Terms Snapshot A snapshot is the state of a table at some time.\nEach snapshot lists all of the data files that make up the table’s contents at the time of the snapshot. Data files are stored across multiple manifest files, and the manifests for a snapshot are listed in a single manifest list file.\nManifest list A manifest list is a metadata file that lists the manifests that make up a table snapshot.\nEach manifest file in the manifest list is stored with information about its contents, like partition value ranges, used to speed up metadata operations.\nManifest file A manifest file is a metadata file that lists a subset of data files that make up a snapshot.\nEach data file in a manifest is stored with a partition tuple, column-level stats, and summary information used to prune splits during scan planning.\nPartition spec A partition spec is a description of how to partition data in a table.\nA spec consists of a list of source columns and transforms. A transform produces a partition value from a source value. For example, date(ts) produces the date associated with a timestamp column named ts.\nPartition tuple A partition tuple is a tuple or struct of partition data stored with each data file.\nAll values in a partition tuple are the same for all rows stored in a data file. Partition tuples are produced by transforming values from row data using a partition spec.\nIceberg stores partition values unmodified, unlike Hive tables that convert values to and from strings in file system paths and keys.\nSnapshot log (history table) The snapshot log is a metadata log of how the table’s current snapshot has changed over time.\nThe log is a list of timestamp and ID pairs: when the current snapshot changed and the snapshot ID the current snapshot was changed to.\nThe snapshot log is stored in table metadata as snapshot-log.\n","description":"","title":"Terms","uri":"/terms/"},{"categories":null,"content":" Trademarks Apache Iceberg, Iceberg, Apache, the Apache feather logo, and the Apache Iceberg project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.\n","description":"","title":"Trademarks","uri":"/trademarks/"},{"categories":null,"content":" Vendors Supporting Iceberg Tables This page contains some of the vendors who are shipping and supporting Apache Iceberg in their products\nCelerData CelerData provides commercial offerings for StarRocks, a distributed MPP SQL engine for enterprise analytics on Iceberg. With its fully vectorized technology, local caching, and intelligent materialized view, StarRocks delivers sub-second query latency for both batch and real-time analytics. CelerData offers both an enterprise deployment and a cloud service to help customers use StarRocks more smoothly. Learn more about how to query Iceberg with StarRocks here.\nClickHouse ClickHouse is a column-oriented database that enables its users to generate powerful analytics, using SQL queries, in real-time. ClickHouse integrates well with Iceberg and offers two options to work with it:\nVia Iceberg table function: Provides a read-only table-like interface to Apache Iceberg tables in Amazon S3. Via the Iceberg table engine: An engine that provides a read-only integration with existing Apache Iceberg tables in Amazon S3. Cloudera Cloudera Data Platform integrates Apache Iceberg to the following components:\nApache Hive, Apache Impala, and Apache Spark to query Apache Iceberg tables Cloudera Data Warehouse service providing access to Apache Iceberg tables through Apache Hive and Apache Impala Cloudera Data Engineering service providing access to Apache Iceberg tables through Apache Spark The CDP Shared Data Experience (SDX) provides compliance and self-service data access for Apache Iceberg tables Hive metastore, which plays a lightweight role in providing the Iceberg Catalog Data Visualization to visualize data stored in Apache Iceberg https://docs.cloudera.com/cdp-public-cloud/cloud/cdp-iceberg/topics/iceberg-in-cdp.html\nDremio With Dremio, an organization can easily build and manage a data lakehouse in which data is stored in open formats like Apache Iceberg and can be processed with Dremio’s interactive SQL query engine and non-Dremio processing engines. Dremio Cloud provides these capabilities in a fully managed offering.\nDremio Sonar is a lakehouse query engine that provides interactive performance and DML on Apache Iceberg, as well as other formats and data sources. Dremio Arctic is a lakehouse catalog and optimization service for Apache Iceberg. Arctic automatically optimizes tables in the background to ensure high-performance access for any engine. Arctic also simplifies experimentation, data engineering, and data governance by providing Git concepts like branches and tags on Apache Iceberg tables. IOMETE IOMETE is a fully-managed ready to use, batteries included Data Platform. IOMETE optimizes clustering, compaction, and access control to Apache Iceberg tables. Customer data remains on customer’s account to prevent vendor lock-in. The core of IOMETE platform is a serverless Lakehouse that leverages Apache Iceberg as its core table format. IOMETE platform also includes Serverless Spark, an SQL Editor, A Data Catalog, and granular data access control. IOMETE supports Hybrid-multi-cloud setups.\nPuppyGraph PuppyGraph is a cloud-native graph analytics engine that enables users to query one or more relational data stores as a unified graph model. This eliminates the overhead of deploying and maintaining a siloed graph database system, with no ETL required. PuppyGraph’s native Apache Iceberg integration adds native graph capabilities to your existing data lake in an easy and performant way.\nSnowflake Snowflake is a single, cross-cloud platform that enables every organization to mobilize their data with Snowflake’s Data Cloud. Snowflake supports Apache Iceberg by offering Snowflake-managed Iceberg Tables for full DML as well as externally managed Iceberg Tables with catalog integrations for read-only access.\nStarburst Starburst is a commercial offering for the Trino query engine. Trino is a distributed MPP SQL query engine that can query data in Iceberg at interactive speeds. Trino also enables you to join Iceberg tables with an array of other systems. Starburst offers both an enterprise deployment and a fully managed service to make managing and scaling Trino a flawless experience. Starburst also provides customer support and houses many of the original contributors to the open-source project that know Trino best. Learn more about the Starburst Iceberg connector.\nTabular Tabular is a managed warehouse and automation platform. Tabular offers a central store for analytic data that can be used with any query engine or processing framework that supports Iceberg. Tabular warehouses add role-based access control and automatic optimization, clustering, and compaction to Iceberg tables.\n","description":"","title":"Vendors","uri":"/vendors/"},{"categories":null,"content":" Iceberg View Spec Background and Motivation Most compute engines (e.g. Trino and Apache Spark) support views. A view is a logical table that can be referenced by future queries. Views do not contain any data. Instead, the query stored by the view is executed every time the view is referenced by another query.\nEach compute engine stores the metadata of the view in its proprietary format in the metastore of choice. Thus, views created from one engine can not be read or altered easily from another engine even when engines share the metastore as well as the storage system. This document standardizes the view metadata for ease of sharing the views across engines.\nGoals A common metadata format for view metadata, similar to how Iceberg supports a common table format for tables. Overview View metadata storage mirrors how Iceberg table metadata is stored and retrieved. View metadata is maintained in metadata files. All changes to view state create a new view metadata file and completely replace the old metadata using an atomic swap. Like Iceberg tables, this atomic swap is delegated to the metastore that tracks tables and/or views by name. The view metadata file tracks the view schema, custom properties, current and past versions, as well as other metadata.\nEach metadata file is self-sufficient. It contains the history of the last few versions of the view and can be used to roll back the view to a previous version.\nMetadata Location An atomic swap of one view metadata file for another provides the basis for making atomic changes. Readers use the version of the view that was current when they loaded the view metadata and are not affected by changes until they refresh and pick up a new metadata location.\nWriters create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer’s commit. Once a writer has created an update, it commits by swapping the view’s metadata file pointer from the base location to the new location.\nSpecification Terms Schema – Names and types of fields in a view. Version – The state of a view at some point in time. View Metadata The view version metadata file has the following fields:\nRequirement Field name Description required view-uuid A UUID that identifies the view, generated when the view is created. Implementations must throw an exception if a view’s UUID does not match the expected UUID after refreshing metadata required format-version An integer version number for the view format; must be 1 required location The view’s base location; used to create metadata file locations required schemas A list of known schemas required current-version-id ID of the current version of the view (version-id) required versions A list of known versions of the view [1] required version-log A list of version log entries with the timestamp and version-id for every change to current-version-id optional properties A string to string map of view properties [2] Notes:\nThe number of versions to retain is controlled by the table property: version.history.num-entries. Properties are used for metadata such as comment and for settings that affect view maintenance. This is not intended to be used for arbitrary metadata. Versions Each version in versions is a struct with the following fields:\nRequirement Field name Description required version-id ID for the version required schema-id ID of the schema for the view version required timestamp-ms Timestamp when the version was created (ms from epoch) required summary A string to string map of summary metadata about the version required representations A list of representations for the view definition optional default-catalog Catalog name to use when a reference in the SELECT does not contain a catalog required default-namespace Namespace to use when a reference in the SELECT is a single identifier When default-catalog is null or not set, the catalog in which the view is stored must be used as the default catalog.\nSummary Summary is a string to string map of metadata about a view version. Common metadata keys are documented here.\nRequirement Key Value required operation Operation that caused this metadata to be created; must be create or replace optional engine-name Name of the engine that created the view version optional engine-version Version of the engine that created the view version Representations View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.\nA view version can have more than one representation. All representations for a version must express the same underlying definition. Engines are free to choose the representation to use.\nView versions are immutable. Once a version is created, it cannot be changed. This means that representations for a version cannot be changed. If a view definition changes (or new representations are to be added), a new version must be created.\nEach representation is an object with at least one common field, type, that is one of the following:\nsql: a SQL SELECT statement that defines the view Representations further define metadata for each type.\nSQL representation The SQL representation stores the view definition as a SQL SELECT, with metadata such as the SQL dialect.\nA view version can have multiple SQL representations of different dialects, but only one SQL representation per dialect.\nRequirement Field name Type Description required type string Must be sql required sql string A SQL SELECT statement required dialect string The dialect of the sql SELECT statement (e.g., “trino” or “spark”) For example:\nUSE prod.default CREATE OR REPLACE VIEW event_agg ( event_count COMMENT 'Count of events', event_date) AS SELECT COUNT(1), CAST(event_ts AS DATE) FROM events GROUP BY 2 This create statement would produce the following sql representation metadata:\nField name Value type \"sql\" sql \"SELECT\\n COUNT(1), CAST(event_ts AS DATE)\\nFROM events\\nGROUP BY 2\" dialect \"spark\" If a create statement does not include column names or comments before AS, the fields should be omitted.\nThe event_count (with the Count of events comment) and event_date field aliases must be part of the view version’s schema.\nVersion log The version log tracks changes to the view’s current version. This is the view’s history and allows reconstructing what version of the view would have been used at some point in time.\nNote that this is not the version’s creation time, which is stored in each version’s metadata. A version can appear multiple times in the version log, indicating that the view definition was rolled back.\nEach entry in version-log is a struct with the following fields:\nRequirement Field name Description required timestamp-ms Timestamp when the view’s current-version-id was updated (ms from epoch) required version-id ID that current-version-id was set to Appendix A: An Example The JSON metadata file format is described using an example below.\nImagine the following sequence of operations:\nUSE prod.default CREATE OR REPLACE VIEW event_agg ( event_count COMMENT 'Count of events', event_date) COMMENT 'Daily event counts' AS SELECT COUNT(1), CAST(event_ts AS DATE) FROM events GROUP BY 2 The metadata JSON file created looks as follows.\nThe path is intentionally similar to the path for Iceberg tables and uses a metadata directory.\ns3://bucket/warehouse/default.db/event_agg/metadata/00001-(uuid).metadata.json { \"view-uuid\": \"fa6506c3-7681-40c8-86dc-e36561f83385\", \"format-version\" : 1, \"location\" : \"s3://bucket/warehouse/default.db/event_agg\", \"current-version-id\" : 1, \"properties\" : { \"comment\" : \"Daily event counts\" }, \"versions\" : [ { \"version-id\" : 1, \"timestamp-ms\" : 1573518431292, \"schema-id\" : 1, \"default-catalog\" : \"prod\", \"default-namespace\" : [ \"default\" ], \"summary\" : { \"operation\" : \"create\", \"engine-name\" : \"Spark\", \"engineVersion\" : \"3.3.2\" }, \"representations\" : [ { \"type\" : \"sql\", \"sql\" : \"SELECT\\n COUNT(1), CAST(event_ts AS DATE)\\nFROM events\\nGROUP BY 2\", \"dialect\" : \"spark\" } ] } ], \"schemas\": [ { \"schema-id\": 1, \"type\" : \"struct\", \"fields\" : [ { \"id\" : 1, \"name\" : \"event_count\", \"required\" : false, \"type\" : \"int\", \"doc\" : \"Count of events\" }, { \"id\" : 2, \"name\" : \"event_date\", \"required\" : false, \"type\" : \"date\" } ] } ], \"version-log\" : [ { \"timestamp-ms\" : 1573518431292, \"version-id\" : 1 } ] } Each change creates a new metadata JSON file.\nUSE prod.other_db; CREATE OR REPLACE VIEW default.event_agg ( event_count, event_date) AS SELECT COUNT(1), CAST(event_ts AS DATE) FROM prod.default.events GROUP BY 2 Updating the view produces a new metadata file that completely replaces the old:\ns3://bucket/warehouse/default.db/event_agg/metadata/00002-(uuid).metadata.json { \"view-uuid\": \"fa6506c3-7681-40c8-86dc-e36561f83385\", \"format-version\" : 1, \"location\" : \"s3://bucket/warehouse/default.db/event_agg\", \"current-version-id\" : 1, \"properties\" : { \"comment\" : \"Daily event counts\" }, \"versions\" : [ { \"version-id\" : 1, \"timestamp-ms\" : 1573518431292, \"schema-id\" : 1, \"default-catalog\" : \"prod\", \"default-namespace\" : [ \"default\" ], \"summary\" : { \"operation\" : \"create\", \"engine-name\" : \"Spark\", \"engineVersion\" : \"3.3.2\" }, \"representations\" : [ { \"type\" : \"sql\", \"sql\" : \"SELECT\\n COUNT(1), CAST(event_ts AS DATE)\\nFROM events\\nGROUP BY 2\", \"dialect\" : \"spark\" } ] }, { \"version-id\" : 2, \"timestamp-ms\" : 1573518981593, \"schema-id\" : 1, \"default-catalog\" : \"prod\", \"default-namespace\" : [ \"default\" ], \"summary\" : { \"operation\" : \"create\", \"engine-name\" : \"Spark\", \"engineVersion\" : \"3.3.2\" }, \"representations\" : [ { \"type\" : \"sql\", \"sql\" : \"SELECT\\n COUNT(1), CAST(event_ts AS DATE)\\nFROM prod.default.events\\nGROUP BY 2\", \"dialect\" : \"spark\" } ] } ], \"schemas\": [ { \"schema-id\": 1, \"type\" : \"struct\", \"fields\" : [ { \"id\" : 1, \"name\" : \"event_count\", \"required\" : false, \"type\" : \"int\", \"doc\" : \"Count of events\" }, { \"id\" : 2, \"name\" : \"event_date\", \"required\" : false, \"type\" : \"date\" } ] } ], \"version-log\" : [ { \"timestamp-ms\" : 1573518431292, \"version-id\" : 1 }, { \"timestamp-ms\" : 1573518981593, \"version-id\" : 2 } ] } ","description":"","title":"View Spec","uri":"/view-spec/"},{"categories":null,"content":" Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Learn More ","description":"","title":"What is Iceberg?","uri":"/about/about/"}]