RFC-1: Entity Key Encoding design for KV store (removed)

RevisionOwnerDate
v0.1Qi Yu1/8/2023
v0.2Qi Yu5/9/2023

❌️ REMOVED

Removed KV store on: 2024-12

This document is no longer maintained and may contain outdated information.

Background

Currently, there will be many data to storage, for example, User information. Such as username, password, user property, this data is structure data Metadata. Metadata is the key point of our product. Metadata is Heterogeneous and data may be very large Other information, such as query history, query log and so on

To store this information, there will be 3 kinds of databases to store, that is Relational database KV Store GraphDB Others like MongoDB, ES

After we refer to snowflake schema and others (Hashdata and so on), KV storage is better than other kinds of storage compared to relational databases.

According to our previous design, Metadata in gravitino will be organized as the following structure mc

According to the picture above, Metadata in gravitino can be divided into multiple layers which is a little bit like a file directory structure. To implement this hierarchy, we can choose the following options to encode keys.

Design

Target

We should design a key encoding method to encode the key of KV store that should satisfy the following requirements:

  • Support fast point queries
  • Support efficient range queries
  • Not very complicated
  • Good expandability and compatibility

Design

Firstly, we introduce a global auto-increment ID (or simply a UUID) that represents the name of namespace. This ID is unique in the whole system. For example, if there exists a catalog named catalog1 with namespace name metalake1, and a schema name schema3 under catalog2, then we will add the following key-value pair to KV store

KeyValueDescription
metalake11name to id mapping,
1/catalog22name to id mapping, 1 is the id of metalake1
1/2/schema33name to id mapping, 2 is the id of catalog2
1metalake1id to name mapping
21/catalog2id to name mapping
31/2/schema3id to name mapping

Note, for catalog and makelake, Considering the global uniqueness, we will use a UUID instead of auto-increment ID. schema, table and other entities will use auto-increment ID.

Why we introduce this global auto-increment ID? Because we want to support the following features:

  • If we want to rename a namespace, we can just update the name to id mapping
  • If the name is too long, we can use a short name to represent it when encoding key of entity (see below)

Then, The whole key of entity can be encoded as the following format

KeyValueDescription
ml/{ml_id}metalake infoml is a short name for metalake
ml/{ml_id}metalake infoml is a short name for metalake
ca/{ml_id}/{ca_id}catalog_infoca is a short name for catalog
ca/{ml_id}/{ca_id}catalog_infoca is a short name for catalog
sc/{ml_id}/{ca_id}/{sc_id}schema_infosc is a short name for schema
sc/{ml_id}/{ca_id}/{sc_id}schema_infosc is a short name for schema
br/{ml_id}/{ca_id}/{br_id}broker_infobr is a short name for broker
br/{ml_id}/{ca_id}/{br_id}broker_infobr is a short name for broker
ta/{ml_id}/{catalog_id}/{schema_id}/{table_id}table_infota is a short name for table
to/{ml_id}/{catalog_id}/{br_id}/{topic_id}topic_infoto is a short name for topic

Implementation

Please see code BinaryEntityKeyEncoder