This document provides guidance on managing metadata within Apache Gravitino using the Command Line Interface (CLI). The CLI offers a terminal based alternative to using code or the REST interface for metadata management.
Currently, the CLI allows users to view metadata information for metalakes, catalogs, schemas, tables, users, roles, groups, tags, topics and filesets. Future updates will expand on these capabilities.
You can configure an alias for the CLI for ease of use, with the following command:
alias gcli='java -jar ../../cli/build/libs/gravitino-cli-*-incubating-SNAPSHOT.jar'
Or you use the gcli.sh script found in the clients/cli/bin/ directory to run the CLI.
The general structure for running commands with the Gravitino CLI is gcli entity command [options].
usage: gcli [metalake|catalog|schema|table|column|user|group|tag|topic|fileset] [list|details|create|delete|update|set|remove|properties|revoke|grant] [options] Options usage: gcli -a,--audit display audit information --auto <arg> column value auto-increments (true/false) -c,--comment <arg> entity comment --columnfile <arg> CSV file describing columns -d,--distribution display distribution information --datatype <arg> column data type --default <arg> default column value -f,--force force operation -g,--group <arg> group name -h,--help command help information -i,--ignore ignore client/sever version check -l,--user <arg> user name --login <arg> user name -m,--metalake <arg> metalake name -n,--name <arg> full entity name (dot separated) --null <arg> column value can be null (true/false) -o,--owner display entity owner --output <arg> output format (plain/table) -P,--property <arg> property name -p,--properties <arg> property name/value pairs --partition display partition information --position <arg> position of column -r,--role <arg> role name --rename <arg> new entity name -s,--server Gravitino server version --simple simple authentication --sortorder display sortorder information -t,--tag <arg> tag name -u,--url <arg> Gravitino URL (default: http://localhost:8090) -v,--version Gravitino client version -V,--value <arg> property value -x,--index display index information -z,--provider <arg> provider one of hadoop, hive, mysql, postgres, iceberg, kafka
The following commands are used for entity management:
As dealing with one Metalake is a typical scenario, you can set the Metalake name in several ways so it doesn't need to be passed on the command line.
--metalake parameter.GRAVITINO_METALAKE environment variable.The command line option overrides the environment variable and the environment variable overrides the configuration file.
As you need to set the Gravitino URL for every command, you can set the URL in several ways.
--url parameter.The command line option overrides the environment variable and the environment variable overrides the configuration file.
The authentication type can also be set in several ways.
--simple flag.The gravitino CLI can read commonly used CLI options from a configuration file. By default, the file is .gravitino in the user's home directory. The metalake, URL and ignore parameters can be set in this file.
# # Gravitino CLI configuration file # # Metalake to use metalake=metalake_demo # Gravitino server to connect to URL=http://localhost:8090 # Ignore client/server version mismatch ignore=true # Authentication auth=simple
OAuth authentication can also be configured via the configuration file.
# Authentication auth=oauth serverURI=http://127.0.0.1:1082 credential=xx:xx token=test scope=token/test
Kerberos authentication can also be configured via the configuration file.
# Authentication auth=kerberos principal=user/admin@foo.com keytabFile=file.keytab
For operations that delete data or rename a metalake the user with be prompted to make sure they wish to run this command. The --force option can be specified to override this behaviour.
All the commands are performed by using the Java API internally.
To display help on command usage:
gcli --help
To display the client version:
gcli --version
To display the server version:
gcli --server
If the client and server are running different versions of the Gravitino software then you may need to ignore the client/server version check for the command to run. This can be done in several ways:
--ignore parameter.GRAVITINO_IGNORE environment variable.For commands that accept multiple properties they can be specified in a couple of different ways:
gcli --properties n1=v1,n2=v2,n3=v3
gcli --properties n1=v1 n2=v2 n3=v3
gcli --properties n1=v1 --properties n2=v2 --properties n3=v3
Different options are needed to add a tag and set a property of a tag with gcli tag set. To add a tag, specify the tag (via --tag) and the entity to tag (via --name). To set the property of a tag (via --tag) you need to specify the property (via --property) and value (via --value) you want to set.
To delete a tag, again, you need to specify the tag and entity, to remove a tag's property you need to select the tag and property.
Please set the metalake in the Gravitino configuration file or the environment variable before running any of these commands.
gcli metalake list
gcli metalake details
gcli metalake details --audit
gcli metalake create --metalake my_metalake --comment "This is my metalake"
gcli metalake delete
gcli metalake update --rename demo
gcli metalake update --comment "new comment"
gcli metalake properties
gcli metalake set --property test --value value
gcli metalake remove --property test
gcli catalog list
gcli catalog details --name catalog_postgres
gcli catalog details --name catalog_postgres --audit
The type of catalog to be created is specified by the --provider option. Different catalogs require different properties, for example, a Hive catalog requires a metastore-uri property.
gcli catalog create --name hive --provider hive --properties metastore.uris=thrift://hive-host:9083
gcli catalog create -name iceberg --provider iceberg --properties uri=thrift://hive-host:9083,catalog-backend=hive,warehouse=hdfs://hdfs-host:9000/user/iceberg/warehouse
gcli catalog create -name mysql --provider mysql --properties jdbc-url=jdbc:mysql://mysql-host:3306?useSSL=false,jdbc-user=user,jdbc-password=password,jdbc-driver=com.mysql.cj.jdbc.Driver
gcli catalog create -name postgres --provider postgres --properties jdbc-url=jdbc:postgresql://postgresql-host/mydb,jdbc-user=user,jdbc-password=password,jdbc-database=db,jdbc-driver=org.postgresql.Driver
gcli catalog create --name kafka --provider kafka --properties bootstrap.servers=127.0.0.1:9092,127.0.0.2:9092
gcli catalog create --name doris --provider doris --properties jdbc-url=jdbc:mysql://localhost:9030,jdbc-driver=com.mysql.jdbc.Driver,jdbc-user=admin,jdbc-password=password
gcli catalog create --name paimon --provider paimon --properties catalog-backend=jdbc,uri=jdbc:mysql://127.0.0.1:3306/metastore_db,authentication.type=simple
gcli catalog create --name hudi --provider hudi --properties catalog-backend=hms,uri=thrift://127.0.0.1:9083
gcli catalog create --name oceanbase --provider oceanbase --properties jdbc-url=jdbc:mysql://localhost:2881,jdbc-driver=com.mysql.jdbc.Driver,jdbc-user=admin,jdbc-password=password
gcli catalog delete --name hive
gcli catalog update --name catalog_mysql --rename mysql
gcli catalog update --name catalog_mysql --comment "new comment"
gcli catalog properties --name catalog_mysql
gcli catalog set --name catalog_mysql --property test --value value
gcli catalog remove --name catalog_mysql --property test
gcli schema list --name catalog_postgres
gcli schema details --name catalog_postgres.hr
gcli schema details --name catalog_postgres.hr --audit
gcli schema create --name catalog_postgres.new_db
gcli schema properties --name catalog_postgres.hr -i
Setting and removing schema properties is not currently supported by the Java API or the Gravitino CLI.
When creating a table the columns are specified in CSV file specifying the name of the column, the datatype, a comment, true or false if the column is nullable, true or false if the column is auto incremented, a default value and a default type. Not all of the columns need to be specifed just the name and datatype columns. If not specified comment default to null, nullability to true and auto increment to false. If only the default value is specified it defaults to the same data type as the column.
Example CSV file
Name,Datatype,Comment,Nullable,AutoIncrement,DefaultValue,DefaultType name,String,person's name ID,Integer,unique id,false,true location,String,city they work in,false,false,Sydney,String
gcli table list --name catalog_postgres.hr
gcli table details --name catalog_postgres.hr.departments
gcli table details --name catalog_postgres.hr.departments --audit
gcli table details --name catalog_postgres.hr.departments --distribution
gcli table details --name catalog_postgres.hr.departments --partition
gcli table details --name catalog_postgres.hr.departments --sortorder
gcli table details --name catalog_mysql.db.iceberg_namespace_properties --index
gcli table delete --name catalog_postgres.hr.salaries
gcli table properties --name catalog_postgres.hr.salaries
gcli table set --name catalog_postgres.hr.salaries --property test --value value
gcli table remove --name catalog_postgres.hr.salaries --property test
gcli table create --name catalog_postgres.hr.salaries --comment "comment" --columnfile ~/table.csv
gcli user create --user new_user
gcli user details --user new_user
gcli user list
gcli user delete --user new_user
gcli group create --group new_group
gcli group details --group new_group
gcli group list
gcli group delete --group new_group
gcli tag details --tag tagA
gcli tag create --tag tagA tagB
gcli tag list
gcli tag delete --tag tagA tagB
gcli tag set --name catalog_postgres.hr --tag tagA tagB
gcli tag remove --name catalog_postgres.hr --tag tagA tagB
gcli tag list --name catalog_postgres.hr
gcli tag properties --tag tagA
gcli tag set --tag tagA --property test --value value
gcli tag remove --tag tagA --property test
gcli tag update --tag tagA --rename newTag
gcli tag update --tag tagA --comment "new comment"
gcli catalog details --owner --name postgres
gcli catalog set --owner --user admin --name postgres
gcli catalog set --owner --group groupA --name postgres
gcli role details --role admin
gcli role list
gcli role create --role admin
gcli role delete --role admin
gcli user grant --user new_user --role admin
gcli user revoke --user new_user --role admin
gcli group grant --group groupA --role admin
gcli group revoke --group groupA --role admin
gcli topic details --name kafka.default.topic3
gcli topic create --name kafka.default.topic3
gcli topic list --name kafka.default
gcli topic delete --name kafka.default.topic3
gcli topic update --name kafka.default.topic3 --comment new_comment
gcli topic properties --name kafka.default.topic3
gcli topic set --name kafka.default.topic3 --property test --value value
gcli topic remove --name kafka.default.topic3 --property test
gcli fileset create --name hadoop.schema.fileset --properties managed=true,location=file:/tmp/root/schema/example
gcli fileset list --name hadoop.schema
gcli fileset details --name hadoop.schema.fileset
gcli fileset delete --name hadoop.schema.fileset
gcli fileset update --name hadoop.schema.fileset --comment new_comment
gcli fileset update --name hadoop.schema.fileset --rename new_name
gcli fileset properties --name hadoop.schema.fileset
gcli fileset set --name hadoop.schema.fileset --property test --value value
gcli fileset remove --name hadoop.schema.fileset --property test
Note that some commands are not supported depending on what the database supports.
When setting the datatype of a column the following basic types are currently supported: null, boolean, byte, ubyte, short, ushort, integer, uinteger, long, ulong, float, double, date, time, timestamp, tztimestamp, intervalyear, intervalday, uuid, string, binary
In addition decimal(precision,scale) and varchar(length).
gcli column create --name catalog_postgres.hr.departments.value --datatype long gcli column create --name catalog_postgres.hr.departments.money --datatype "decimal(10,2)" gcli column create --name catalog_postgres.hr.departments.name --datatype "varchar(100)" gcli column create --name catalog_postgres.hr.departments.fullname --datatype "varchar(250)" --default "Fred Smith" --null=false
gcli column delete --name catalog_postgres.hr.departments.money
gcli column update --name catalog_postgres.hr.departments.value --rename values gcli column update --name catalog_postgres.hr.departments.values --datatype "varchar(500)" gcli column update --name catalog_postgres.hr.departments.values --position name gcli column update --name catalog_postgres.hr.departments.name --null true
gcli <normal command> --simple
gcli <normal command> --simple --login userName