website/docs/unity-catalog.md - incubator-xtable - Git at Google

 ---
 sidebar_position: 3
 ---

 # Unity Catalog
 This document walks through the steps to register a Onetable synced Delta table in Unity Catalog on Databricks.

 ## Pre-requisites
 1. Source table(s) (Hudi/Iceberg) already written to external storage locations like S3/GCS.
    If you don't have a source table written in S3/GCS,
    you can follow the steps in [this](https://onetable.dev/docs/hms) tutorial to set it up.
 2. Setup connection to external storage locations from Databricks.
    * Follow the steps outlined [here](https://docs.databricks.com/en/storage/amazon-s3.html) for Amazon S3
    * Follow the steps outlined [here](https://docs.databricks.com/en/storage/gcs.html) for Google Cloud Storage
 3. Create a Unity Catalog metastore in Databricks as outlined [here](https://docs.gcp.databricks.com/data-governance/unity-catalog/create-metastore.html#create-a-unity-catalog-metastore).
 4. Create an external location in Databricks as outlined [here](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-location.html).
 5. Clone the Onetable [repository](https://github.com/onetable-io/onetable) and create the
    `utilities-0.1.0-SNAPSHOT-bundled.jar` by following the steps on the [Installation page](https://onetable.dev/docs/setup)

 ## Steps
 ### Running sync
 Create `my_config.yaml` in the cloned Onetable directory.

 ```yaml md title="yaml"
 sourceFormat: HUDI|ICEBERG # choose only one
 targetFormats:
   - DELTA
 datasets:
   -
     tableBasePath: s3://path/to/source/data
     tableName: table_name
     partitionSpec: partitionpath:VALUE
 ```
 :::tip Note:
 Replace `s3://path/to/source/data` to `gs://path/to/source/data` if you have your source table in GCS.
 And replace with appropriate values for `sourceFormat`, and `tableName` fields.
 :::

 From your terminal under the cloned Onetable directory, run the sync process using the below command.

 ```shell md title="shell"
 java -jar utilities/target/utilities-0.1.0-SNAPSHOT-bundled.jar -datasetConfig my_config.yaml
 ```

 :::tip Note:
 At this point, if you check your bucket path, you will be able to see `_delta_log` directory with
 00000000000000000000.json which contains the logs that helps query engines to interpret the source table as a Delta table.
 :::

 ### Register the target table in Unity Catalog
 In your Databricks workspace, under SQL editor, run the following queries.

 ```sql md title="SQL"
 CREATE CATALOG onetable;

 CREATE SCHEMA onetable.synced_delta_schema;

 CREATE TABLE onetable.synced_delta_schema.<table_name>
 USING DELTA
 LOCATION 's3://path/to/source/data';
 ```
 :::tip Note:
 Replace `s3://path/to/source/data` to `gs://path/to/source/data` if you have your source table in GCS.
 :::

 ### Validating the results
 You can now see the created delta table in **Unity Catalog** under **Catalog** as `<table_name>` under
 `synced_delta_schema` and also query the table in the SQL editor:

 ```sql
 SELECT * FROM onetable.synced_delta_schema.<table_name>;
 ```

 ## Conclusion
 In this guide we saw how to,
 1. sync a source table to create metadata for the desired target table formats using Onetable
 2. catalog the data in Delta format in Unity Catalog on Databricks
 3. query the Delta table using Databricks SQL editor
	---
	sidebar_position: 3
	---

	# Unity Catalog
	This document walks through the steps to register a Onetable synced Delta table in Unity Catalog on Databricks.

	## Pre-requisites
	1. Source table(s) (Hudi/Iceberg) already written to external storage locations like S3/GCS.
	If you don't have a source table written in S3/GCS,
	you can follow the steps in [this](https://onetable.dev/docs/hms) tutorial to set it up.
	2. Setup connection to external storage locations from Databricks.
	* Follow the steps outlined [here](https://docs.databricks.com/en/storage/amazon-s3.html) for Amazon S3
	* Follow the steps outlined [here](https://docs.databricks.com/en/storage/gcs.html) for Google Cloud Storage
	3. Create a Unity Catalog metastore in Databricks as outlined [here](https://docs.gcp.databricks.com/data-governance/unity-catalog/create-metastore.html#create-a-unity-catalog-metastore).
	4. Create an external location in Databricks as outlined [here](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-location.html).
	5. Clone the Onetable [repository](https://github.com/onetable-io/onetable) and create the
	`utilities-0.1.0-SNAPSHOT-bundled.jar` by following the steps on the [Installation page](https://onetable.dev/docs/setup)

	## Steps
	### Running sync
	Create `my_config.yaml` in the cloned Onetable directory.

	```yaml md title="yaml"
	sourceFormat: HUDI\|ICEBERG # choose only one
	targetFormats:
	- DELTA
	datasets:
	-
	tableBasePath: s3://path/to/source/data
	tableName: table_name
	partitionSpec: partitionpath:VALUE
	```
	:::tip Note:
	Replace `s3://path/to/source/data` to `gs://path/to/source/data` if you have your source table in GCS.
	And replace with appropriate values for `sourceFormat`, and `tableName` fields.
	:::

	From your terminal under the cloned Onetable directory, run the sync process using the below command.

	```shell md title="shell"
	java -jar utilities/target/utilities-0.1.0-SNAPSHOT-bundled.jar -datasetConfig my_config.yaml
	```

	:::tip Note:
	At this point, if you check your bucket path, you will be able to see `_delta_log` directory with
	00000000000000000000.json which contains the logs that helps query engines to interpret the source table as a Delta table.
	:::

	### Register the target table in Unity Catalog
	In your Databricks workspace, under SQL editor, run the following queries.

	```sql md title="SQL"
	CREATE CATALOG onetable;

	CREATE SCHEMA onetable.synced_delta_schema;

	CREATE TABLE onetable.synced_delta_schema.<table_name>
	USING DELTA
	LOCATION 's3://path/to/source/data';
	```
	:::tip Note:
	Replace `s3://path/to/source/data` to `gs://path/to/source/data` if you have your source table in GCS.
	:::

	### Validating the results
	You can now see the created delta table in Unity Catalog under Catalog as `<table_name>` under
	`synced_delta_schema` and also query the table in the SQL editor:

	```sql
	SELECT * FROM onetable.synced_delta_schema.<table_name>;
	```

	## Conclusion
	In this guide we saw how to,
	1. sync a source table to create metadata for the desired target table formats using Onetable
	2. catalog the data in Delta format in Unity Catalog on Databricks
	3. query the Delta table using Databricks SQL editor