| --- |
| title: "Java Quickstart" |
| url: java-api-quickstart |
| aliases: |
| - "java/quickstart" |
| menu: |
| main: |
| parent: "API" |
| weight: 100 |
| --- |
| <!-- |
| - Licensed to the Apache Software Foundation (ASF) under one or more |
| - contributor license agreements. See the NOTICE file distributed with |
| - this work for additional information regarding copyright ownership. |
| - The ASF licenses this file to You under the Apache License, Version 2.0 |
| - (the "License"); you may not use this file except in compliance with |
| - the License. You may obtain a copy of the License at |
| - |
| - http://www.apache.org/licenses/LICENSE-2.0 |
| - |
| - Unless required by applicable law or agreed to in writing, software |
| - distributed under the License is distributed on an "AS IS" BASIS, |
| - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| - See the License for the specific language governing permissions and |
| - limitations under the License. |
| --> |
| |
| # Java API Quickstart |
| |
| ## Create a table |
| |
| Tables are created using either a [`Catalog`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/catalog/Catalog.html) or an implementation of the [`Tables`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/Tables.html) interface. |
| |
| ### Using a Hive catalog |
| |
| The Hive catalog connects to a Hive metastore to keep track of Iceberg tables. |
| You can initialize a Hive catalog with a name and some properties. |
| (see: [Catalog properties](../configuration/#catalog-properties)) |
| |
| **Note:** Currently, `setConf` is always required for hive catalogs, but this will change in the future. |
| |
| ```java |
| import org.apache.iceberg.hive.HiveCatalog; |
| |
| HiveCatalog catalog = new HiveCatalog(); |
| catalog.setConf(spark.sparkContext().hadoopConfiguration()); // Configure using Spark's Hadoop configuration |
| |
| Map <String, String> properties = new HashMap<String, String>(); |
| properties.put("warehouse", "..."); |
| properties.put("uri", "..."); |
| |
| catalog.initialize("hive", properties); |
| ``` |
| |
| The `Catalog` interface defines methods for working with tables, like `createTable`, `loadTable`, `renameTable`, and `dropTable`. `HiveCatalog` implements the `Catalog` interface. |
| |
| To create a table, pass an `Identifier` and a `Schema` along with other initial metadata: |
| |
| ```java |
| import org.apache.iceberg.Table; |
| import org.apache.iceberg.catalog.TableIdentifier; |
| |
| TableIdentifier name = TableIdentifier.of("logging", "logs"); |
| Table table = catalog.createTable(name, schema, spec); |
| |
| // or to load an existing table, use the following line |
| // Table table = catalog.loadTable(name); |
| ``` |
| |
| The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below. |
| |
| |
| ### Using a Hadoop catalog |
| |
| A Hadoop catalog doesn't need to connect to a Hive MetaStore, but can only be used with HDFS or similar file systems that support atomic rename. Concurrent writes with a Hadoop catalog are not safe with a local FS or S3. To create a Hadoop catalog: |
| |
| ```java |
| import org.apache.hadoop.conf.Configuration; |
| import org.apache.iceberg.hadoop.HadoopCatalog; |
| |
| Configuration conf = new Configuration(); |
| String warehousePath = "hdfs://host:8020/warehouse_path"; |
| HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath); |
| ``` |
| |
| Like the Hive catalog, `HadoopCatalog` implements `Catalog`, so it also has methods for working with tables, like `createTable`, `loadTable`, and `dropTable`. |
| |
| This example creates a table with Hadoop catalog: |
| |
| ```java |
| import org.apache.iceberg.Table; |
| import org.apache.iceberg.catalog.TableIdentifier; |
| |
| TableIdentifier name = TableIdentifier.of("logging", "logs"); |
| Table table = catalog.createTable(name, schema, spec); |
| |
| // or to load an existing table, use the following line |
| // Table table = catalog.loadTable(name); |
| ``` |
| |
| The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below. |
| |
| |
| ### Using Hadoop tables |
| |
| Iceberg also supports tables that are stored in a directory in HDFS. Concurrent writes with a Hadoop tables are not safe when stored in the local FS or S3. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`. |
| |
| To create a table in HDFS, use `HadoopTables`: |
| |
| ```java |
| import org.apache.hadoop.conf.Configuration; |
| import org.apache.iceberg.hadoop.HadoopTables; |
| import org.apache.iceberg.Table; |
| |
| Configuration conf = new Configuration(); |
| HadoopTables tables = new HadoopTables(conf); |
| Table table = tables.create(schema, spec, table_location); |
| |
| // or to load an existing table, use the following line |
| // Table table = tables.load(table_location); |
| ``` |
| |
| {{< hint danger >}} |
| Hadoop tables shouldn't be used with file systems that do not support atomic rename. Iceberg relies on rename to synchronize concurrent commits for directory tables. |
| {{< /hint >}} |
| |
| ### Tables in Spark |
| |
| Spark uses both `HiveCatalog` and `HadoopTables` to load tables. Hive is used when the identifier passed to `load` or `save` is not a path, otherwise Spark assumes it is a path-based table. |
| |
| To read and write to tables from Spark see: |
| |
| * [SQL queries in Spark](../spark-queries#querying-with-sql) |
| * [`INSERT INTO` in Spark](../spark-writes#insert-into) |
| * [`MERGE INTO` in Spark](../spark-writes#merge-into) |
| |
| |
| ## Schemas |
| |
| ### Create a schema |
| |
| This example creates a schema for a `logs` table: |
| |
| ```java |
| import org.apache.iceberg.Schema; |
| import org.apache.iceberg.types.Types; |
| |
| Schema schema = new Schema( |
| Types.NestedField.required(1, "level", Types.StringType.get()), |
| Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()), |
| Types.NestedField.required(3, "message", Types.StringType.get()), |
| Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get())) |
| ); |
| ``` |
| |
| When using the Iceberg API directly, type IDs are required. Conversions from other schema formats, like Spark, Avro, and Parquet will automatically assign new IDs. |
| |
| When a table is created, all IDs in the schema are re-assigned to ensure uniqueness. |
| |
| ### Convert a schema from Avro |
| |
| To create an Iceberg schema from an existing Avro schema, use converters in `AvroSchemaUtil`: |
| |
| ```java |
| import org.apache.avro.Schema; |
| import org.apache.avro.Schema.Parser; |
| import org.apache.iceberg.avro.AvroSchemaUtil; |
| |
| Schema avroSchema = new Parser().parse("{\"type\": \"record\" , ... }"); |
| Schema icebergSchema = AvroSchemaUtil.toIceberg(avroSchema); |
| ``` |
| |
| ### Convert a schema from Spark |
| |
| To create an Iceberg schema from an existing table, use converters in `SparkSchemaUtil`: |
| |
| ```java |
| import org.apache.iceberg.spark.SparkSchemaUtil; |
| |
| Schema schema = SparkSchemaUtil.schemaForTable(sparkSession, table_name); |
| ``` |
| |
| ## Partitioning |
| |
| ### Create a partition spec |
| |
| Partition specs describe how Iceberg should group records into data files. Partition specs are created for a table's schema using a builder. |
| |
| This example creates a partition spec for the `logs` table that partitions records by the hour of the log event's timestamp and by log level: |
| |
| ```java |
| import org.apache.iceberg.PartitionSpec; |
| |
| PartitionSpec spec = PartitionSpec.builderFor(schema) |
| .hour("event_time") |
| .identity("level") |
| .build(); |
| ``` |
| |
| For more information on the different partition transforms that Iceberg offers, visit [this page](../../../spec#partitioning). |