docs/index.md

title: Apache Gravitino overview slug: / license: “This software is licensed under the Apache License version 2.”

What's Apache Gravitino?

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions. It also provides users with unified metadata access for data and AI assets.

Learn more→

Downloading

You can get Gravitino from the download page, or you can build Gravitino from source code. See How to build Gravitino.

Gravitino runs on both Linux and macOS platforms, and it requires the installation of Java 17. This should include JVMs on x86_64 and ARM64. It's easy to run locally on one machine, all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

See How to install Gravitino to learn how to install the Gravitino server.

Gravitino provides Docker images on Docker Hub. Pull the image and run it. For details of the Gravitino Docker image, see Docker image details.

Gravitino also provides a playground to experience the whole Gravitino system with other components. See the Gravitino playground repository and How to use the playground.

Getting started

To get started with Gravitino, see Getting started for the details.

Getting started locally: a quick guide to starting and using Gravitino locally.
Running on Amazon Web Services: a quick guide to starting and using Gravitino on AWS.
Running on Google Cloud Platform: a quick guide to starting and using Gravitino on GCP.

How to use Apache Gravitino

Gravitino provides two SDKs to manage metadata from different catalogs in a unified way: the REST API and the Java SDK. You can use either to manage metadata. See

Manage metalake using Gravitino to learn how to manage metalakes.
Manage relational metadata using Gravitino to learn how to manage relational metadata.
Manage fileset metadata using Gravitino to learn how to manage fileset metadata.
Manage messaging metadata using Gravitino to learn how to manage messaging metadata.
Manage model metadata using Gravitino to learn how to manage model metadata.

Also, you can find the complete REST API definition in Gravitino Open API, Java SDK definition in Gravitino Java doc, and Python SDK definition in Gravitino Python doc.

Gravitino also provides a web UI to manage the metadata. Visit the web UI in the browser via http://<ip-address>:8090. See Gravitino web UI for details.

Gravitino also provides a Command Line Interface (CLI) to manage the metadata. See Gravitino CLI for details.

Gravitino currently supports the following catalogs:

Relational catalogs:

If you want to operate table and partition statistics, you can refer to the document.

Fileset catalogs:

Fileset catalog

Messaging catalogs:

Kafka catalog

Model catalogs:

Model catalog

Apache Gravitino playground

To experience Gravitino with other components easily, Gravitino provides a playground to run. It integrates Apache Hadoop, Apache Hive, Trino, MySQL, PostgreSQL, and Gravitino together as a complete environment. To experience all the features, see Getting started and How to use the Gravitino playground.

Install Gravitino playground on AWS or GCP: a quick guide to starting and using the Gravitino playground on AWS or GCP.
Install Gravitino playground locally: a quick guide to starting and using the Gravitino playground locally.
How to use the Gravitino playground: provides an example of how to use Gravitino and other components together.

Where to go from here

Catalogs

Gravitino supports different catalogs to manage the metadata in different sources. Please see:

Doris catalog: a complete guide to using Gravitino to manage Doris data.
StarRocks catalog: a complete guide to using Gravitino to manage StarRocks data.
Fileset catalog: a complete guide to using Gravitino to manage fileset using Hadoop Compatible File System (HCFS).
Hive catalog: a complete guide to using Gravitino to manage Apache Hive data.
Hudi catalog: a complete guide to using Gravitino to manage Apache Hudi data.
Iceberg catalog: a complete guide to using Gravitino to manage Apache Iceberg data.
Kafka catalog: a complete guide to using Gravitino to manage Kafka topics metadata.
Model catalog: a complete guide to using Gravitino to manage model metadata.
MySQL catalog: a complete guide to using Gravitino to manage MySQL data.
Paimon catalog: a complete guide to using Gravitino to manage Apache Paimon data.
PostgreSQL catalog: a complete guide to using Gravitino to manage PostgreSQL data.
OceanBase catalog: a complete guide to using Gravitino to manage OceanBase data.

Governance

Gravitino provides governance features to manage metadata in a unified way. See:

Manage tags in Gravitino: a complete guide to using Gravitino to manage tags.
Manage policies in Gravitino: a complete guide to using Gravitino to manage policies.
Manage jobs in Gravitino: a complete guide to using Gravitino to manage jobs.

Gravitino Iceberg REST catalog service

Iceberg REST catalog service: a guide to using Gravitino as an Apache Iceberg REST catalog service.

Connectors

Trino connector

Gravitino provides a Trino connector to manage Trino metadata in a unified way. To use the Trino connector, see:

How to use Gravitino Trino connector: a complete guide to using the Gravitino Trino connector.

Spark connector

Gravitino provides a Spark connector to manage metadata in a unified way. To use the Spark connector, see:

Gravitino Spark connector: a complete guide to using the Gravitino Spark connector.

Flink connector

Gravitino provides a Flink connector to manage metadata in a unified way. To use the Flink connector, see:

Gravitino Flink connector: a complete guide to using the Gravitino Flink connector.

Server administration

Gravitino provides several ways to configure and manage the Gravitino server. See:

Gravitino metrics: provides metrics configurations and detailed a metrics list of the Gravitino server.

Security

Gravitino provides security configurations for Gravitino, including HTTPS, authentication and access control configurations.

HTTPS: provides HTTPS configurations.
Authentication: provides authentication configurations including simple, OAuth, Kerberos.
Access Control: provides access control configurations.
CORS: provides CORS configurations.

Gravitino MCP server

Gravitino MCP server provides the ability to manage Gravitino metadata for AI tools.

Gravitino MCP server: a complete guide to using the Gravitino MCP server.

Programming guides

Gravitino Open API: provides the complete Open API definition of Gravitino.
Gravitino Java doc: provides the Javadoc for the Gravitino API.
Gravitino Python doc: provides the Python doc for the Gravitino API.

Development guides

How to build Gravitino: a complete guide to building Gravitino from source.
How to test Gravitino: a complete guide to running Gravitino unit and integration tests.
How to sign and verify Gravitino releases: a guide to signing and verifying a Gravitino release.
Publish Docker images: a guide to publishing Gravitino Docker images; also lists the change logs of Gravitino CI Docker images and release images.
How to upgrade Gravitino: a guide to upgrading the schema of Gravitino storage backend from one release version to another.