A Database is a collection of Table objects. You can create/delete databases or create/modify/delete tables under a database.
In Fluss, a Table is the fundamental unit of user data storage, organized into rows and columns. Tables are stored within specific databases, adhering to a hierarchical structure (database -> table).
Tables are classified into two types based on the presence of a primary key:
A Table becomes a Partitioned Table when a partition column is defined. Data with the same partition value is stored in the same partition. Partition columns can be applied to both Log Tables and Primary Key Tables, but with specific considerations:
This design ensures efficient data organization, flexibility in handling different use cases, and adherence to data integrity constraints.
A partition is a logical division of a table's data into smaller, more manageable subsets based on the values of one or more specified columns, known as partition columns. Each unique value (or combination of values) in the partition column(s) defines a distinct partition.
A bucket horizontally divides the data of a table/partition into N buckets according to the bucketing policy. The number of buckets N can be configured per table. A bucket is the smallest unit of data migration and backup. The data of a bucket consists of a LogTablet and a (optional) KvTablet.
A LogTablet needs to be generated for each bucket of Log and Primary Key Tables. For Log Tables, the LogTablet is both the primary table data and the log data. For Primary Key Tables, the LogTablet acts as the log data for the primary table data.
offset sparse index that maps message relative offsets to their corresponding physical byte addresses in the .log file.Each bucket of the Primary Key Table needs to generate a KvTablet. Underlying, each KvTablet corresponds to an embedded RocksDB instance. RocksDB is an LSM (log structured merge) engine which helps KvTablet support high-performance updates and lookup queries.