blob: 80769be3afc5afbbdd4c9f2363086c53157f181a [file] [log] [blame] [view]
---
title: "Java Quickstart"
url: java-api-quickstart
aliases:
- "java/quickstart"
menu:
main:
parent: "API"
weight: 100
---
<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-->
# Java API Quickstart
## Create a table
Tables are created using either a [`Catalog`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/catalog/Catalog.html) or an implementation of the [`Tables`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/Tables.html) interface.
### Using a Hive catalog
The Hive catalog connects to a Hive metastore to keep track of Iceberg tables.
You can initialize a Hive catalog with a name and some properties.
(see: [Catalog properties](../configuration/#catalog-properties))
**Note:** Currently, `setConf` is always required for hive catalogs, but this will change in the future.
```java
import org.apache.iceberg.hive.HiveCatalog;
HiveCatalog catalog = new HiveCatalog();
catalog.setConf(spark.sparkContext().hadoopConfiguration()); // Configure using Spark's Hadoop configuration
Map <String, String> properties = new HashMap<String, String>();
properties.put("warehouse", "...");
properties.put("uri", "...");
catalog.initialize("hive", properties);
```
The `Catalog` interface defines methods for working with tables, like `createTable`, `loadTable`, `renameTable`, and `dropTable`. `HiveCatalog` implements the `Catalog` interface.
To create a table, pass an `Identifier` and a `Schema` along with other initial metadata:
```java
import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;
TableIdentifier name = TableIdentifier.of("logging", "logs");
Table table = catalog.createTable(name, schema, spec);
// or to load an existing table, use the following line
// Table table = catalog.loadTable(name);
```
The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
### Using a Hadoop catalog
A Hadoop catalog doesn't need to connect to a Hive MetaStore, but can only be used with HDFS or similar file systems that support atomic rename. Concurrent writes with a Hadoop catalog are not safe with a local FS or S3. To create a Hadoop catalog:
```java
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopCatalog;
Configuration conf = new Configuration();
String warehousePath = "hdfs://host:8020/warehouse_path";
HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath);
```
Like the Hive catalog, `HadoopCatalog` implements `Catalog`, so it also has methods for working with tables, like `createTable`, `loadTable`, and `dropTable`.
This example creates a table with Hadoop catalog:
```java
import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;
TableIdentifier name = TableIdentifier.of("logging", "logs");
Table table = catalog.createTable(name, schema, spec);
// or to load an existing table, use the following line
// Table table = catalog.loadTable(name);
```
The logs [schema](#create-a-schema) and [partition spec](#create-a-partition-spec) are created below.
### Using Hadoop tables
Iceberg also supports tables that are stored in a directory in HDFS. Concurrent writes with a Hadoop tables are not safe when stored in the local FS or S3. Directory tables don't support all catalog operations, like rename, so they use the `Tables` interface instead of `Catalog`.
To create a table in HDFS, use `HadoopTables`:
```java
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopTables;
import org.apache.iceberg.Table;
Configuration conf = new Configuration();
HadoopTables tables = new HadoopTables(conf);
Table table = tables.create(schema, spec, table_location);
// or to load an existing table, use the following line
// Table table = tables.load(table_location);
```
{{< hint danger >}}
Hadoop tables shouldn't be used with file systems that do not support atomic rename. Iceberg relies on rename to synchronize concurrent commits for directory tables.
{{< /hint >}}
### Tables in Spark
Spark uses both `HiveCatalog` and `HadoopTables` to load tables. Hive is used when the identifier passed to `load` or `save` is not a path, otherwise Spark assumes it is a path-based table.
To read and write to tables from Spark see:
* [SQL queries in Spark](../spark-queries#querying-with-sql)
* [`INSERT INTO` in Spark](../spark-writes#insert-into)
* [`MERGE INTO` in Spark](../spark-writes#merge-into)
## Schemas
### Create a schema
This example creates a schema for a `logs` table:
```java
import org.apache.iceberg.Schema;
import org.apache.iceberg.types.Types;
Schema schema = new Schema(
Types.NestedField.required(1, "level", Types.StringType.get()),
Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
Types.NestedField.required(3, "message", Types.StringType.get()),
Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
);
```
When using the Iceberg API directly, type IDs are required. Conversions from other schema formats, like Spark, Avro, and Parquet will automatically assign new IDs.
When a table is created, all IDs in the schema are re-assigned to ensure uniqueness.
### Convert a schema from Avro
To create an Iceberg schema from an existing Avro schema, use converters in `AvroSchemaUtil`:
```java
import org.apache.avro.Schema;
import org.apache.avro.Schema.Parser;
import org.apache.iceberg.avro.AvroSchemaUtil;
Schema avroSchema = new Parser().parse("{\"type\": \"record\" , ... }");
Schema icebergSchema = AvroSchemaUtil.toIceberg(avroSchema);
```
### Convert a schema from Spark
To create an Iceberg schema from an existing table, use converters in `SparkSchemaUtil`:
```java
import org.apache.iceberg.spark.SparkSchemaUtil;
Schema schema = SparkSchemaUtil.schemaForTable(sparkSession, table_name);
```
## Partitioning
### Create a partition spec
Partition specs describe how Iceberg should group records into data files. Partition specs are created for a table's schema using a builder.
This example creates a partition spec for the `logs` table that partitions records by the hour of the log event's timestamp and by log level:
```java
import org.apache.iceberg.PartitionSpec;
PartitionSpec spec = PartitionSpec.builderFor(schema)
.hour("event_time")
.identity("level")
.build();
```
For more information on the different partition transforms that Iceberg offers, visit [this page](../../../spec#partitioning).