blob: fc75d2d8c81659866bc3ef8074cab94eb96999be [file] [log] [blame] [view]
# Metastore
Metastore can store metadata information for tables, views, etc.
Such metadata can be used during queries execution for better planning,
obtaining additional information, etc.
### Configuration
All configuration properties should reside in `drill.metastore` namespace.
Default Metastore configuration is defined in `drill-metastore-default.conf` file.
It can be overridden in `drill-metastore-override.conf`. Distribution configuration can be
indicated in `drill-metastore-distrib.conf`. Metastore implementations configuration can be
indicated in `drill-metastore-module.conf`.
## Initialization
`MetastoreRegistry` is initialized during Drillbit start up and is accessible though `DrillbitContext`.
It lazily initializes Metastore implementation based
on class implementation config property `drill.metastore.implementation.class`.
Metastore implementation must implement `Metastore` interface and
have constructor which accepts `DrillConfig`.
### Metastore Components
Metastore can store metadata for various components: tables, views etc.
Current implementation provides fully functioning support for tables component.
Views component support is not implemented but contains stub methods to show
how new Metastore components like udfs, storage plugins, etc. be added in future.
### Metastore Tables
Metastore Tables component contains metadata about Drill tables, including general information, as well as
information about table segments, files, row groups, partitions.
`TableMetadataUnit` is a generic representation of Metastore Tables metadata unit,
suitable to any table metadata type (table, segment, file, row group, partition).
`BaseTableMetadata`, `SegmentMetadata`, `FileMetadata`, `RowGroupMetadata` and `PartitionMetadata`
classes which are located in `org.apache.drill.metastore.metadata` package, have methods
to convert to / from `TableMetadataUnit`.
Full table metadata consists of two major concepts: general information and top-level segments metadata.
Table general information contains basic table information and corresponds to `BaseTableMetadata` class.
Table can be non-partitioned and partitioned. Non-partitioned tables, have only one top-level segment
which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned tables may have several top-level segments.
Each top-level segment can include metadata about inner segments, files, row groups and partitions.
Unique table identifier in Metastore Tables is combination of storage plugin, workspace and table name.
Table metadata inside is grouped by top-level segments, unique identifier of the top-level segment and its metadata
is storage plugin, workspace, table name and metadata key.
### Metadata
In order to provide Metastore component metadata functionality `Metadata` interface must be implemented.
`Metadata` interface implementation can be specific for each Metastore component or shared.
For example, for Iceberg Metastore, each component will provide its own `Metadata` interface implementation
since each component is stored in separate Iceberg tables. For other Metastore implementations, if components reside
in the same storage (database), `Metadata` interface implementation can be shared.
#### Versioning
Metastore component may or may not support versioning depending on the implementation.
`Metadata#supportsVersioning` and `Metastore.Metadata#version` methods
are used to indicate versioning support. If Metastore component does not support versioning,
`Metadata#version` returns undefined version (`Metadata#UNDEFINED`).
If Metastore component supports versioning, it is assumed that version is changed each time
data in the Metastore component is modified and remains the same during read operations.
Metastore component version is used to determine if metadata has changed after last access of the Metastore component.
#### Properties
Metastore component may or may not support properties depending on the implementation.
If properties are supported, map with properties names and values is returned.
otherwise empty map is returned. `Metadata#properties` is used to obtain properties information.
### Data filtering
Metastore data can be read or deleted based on the filter expression and metadata types.
#### Metadata types
Each concrete Metastore component implementation supports specific metadata types
which identify metadata inside Metastore component units. For example, for `tables`
component, supported metadata types are` TABLE`, `SEGMENT`, `FILE`, `ROW_GROUP`, `PARTITION`.
Metadata types are based on `org.apache.drill.metastore.metadata.MetadataType` enum names.
Metadata types should be indicated during read or delete operations.
If all metadata types are needed, `MetadataType#ALL` metadata type can be indicated.
#### Filter expressions
All filter expressions implement `FilterExpression` interface.
List of supported filter operators is indicated in `FilterExpression.Operator` enum.
When filter expression is provided in read or delete operation, it's up to Metastore
implementation to convert it into suitable representation for storage.
For convenience, `FilterExpression.Visitor` can be implemented to traverse filter expression.
Filter expression can be simple and contain only one condition:
```
FilterExpression storagePlugin = FilterExpression.equal(MetastoreColumn.STORAGE_PLUGIN, "dfs");
FilterExpression workspaces = FilterExpression.in(MetastoreColumn.WORKSPACE, "root", "tmp");
```
Or it can be complex and contain several conditions combined with `AND` or `OR` operators.
```
FilterExpression filter = FilterExpression.and(
FilterExpression.equal(MetastoreColumn.STORAGE_PLUGIN, "dfs"),
FilterExpression.in(MetastoreColumn.WORKSPACE, "root", "tmp"));
metastore.tables().read()
.metadataType(MetadataType.TABLE)
.filters(filter)
.execute();
```
SQL-like equivalent for the above operation is:
```
select * from METASTORE.TABLES
where storagePlugin = 'dfs'
and workspace in ('root', 'tmp')
and metadataType = 'TABLE'
```
### Metastore Read
In order to provide read functionality each component must implement `Read`.
During implementation component unit type must be indicated.
`Metastore.Read#columns` allows to specify list of columns to be retrieved from the Metastore component.
Columns are represented by `MetastoreColumn` enum values.
`Metastore.Read#filter` allows to specify filter expression by which data will be retrieved.
`Metastore.Read#execute` executes read operation and returns the results.
Data is returned in a form of list of component metadata units, it is caller responsibility to transform received
data into suitable representation. It is expected, if no result is found, empty of list of units
will be returned, not null instance.
For `tables` component, metadata unit is represented by `TableMetadataUnit`. To retrieve `lastModifiedTime`
for all tables in the `dfs` storage plugin the following code can be used:
```
List<TableMetadataUnit> units = metastore.tables().read()
.metadataType(MetadataType.TABLE)
.columns(MetastoreColumn.TABLE_NAME, MetastoreColumn.LAST_MODIFIED_TIME)
.filter(FilterExpression.equal(MetastoreColumn.STORAGE_PLUGIN, "dfs")
.execute();
```
### Metastore Tables Basic Requests
`Tables#basicRequests` provides list of most frequent requests to the Metastore Tables without need
to write filters and transformers from `TableMetadataUnit` class.
Assume caller needs to obtain general metadata about `dfs.tmp.nation` table:
```
TableInfo tableInfo = TableInfo.builder()
.storagePlugin("dfs")
.workspace("tmp")
.name("nation")
.build();
BaseTableMetadata metastoreTableInfo = metastore.tables().basicRequests()
.tableMetadata(tableInfo);
```
### Metastore Modify
In order to provide read functionality each component must implement `Modify`.
During implementation component unit type must be indicated.
If Metastore component implementation supports versioning, it is assumed that each modify operation will
change Metastore component version.
#### Overwrite
`Read#overwrite` writes data into Metastore component or overwrites existing data by unique partition
keys combination.
For example, for `tables` component, partition keys are storage plugin, workspace, table name
and metadata key. Caller provides only list of `TableMetadataUnit` to be written
and Metastore component implementation will decide how data will be stored or overwritten.
##### Adding new table metadata
Assume there is non-partitioned table which metadata is represented with two units.
1. Unit with table general information:
```
TableMetadataUnit tableUnit = TableMetadataUnit.builder()
.storagePlugin("dfs")
.workspace("tmp")
.tableName("nation")
.metadataKey(MetadataInfo.GENERAL_INFO_KEY)
.metadataType(MetadataType.TABLE.name())
...
.build();
```
2. Unit with default segment information with one file:
```
TableMetadataUnit segmentUnit = TableMetadataUnit.builder()
.storagePlugin("dfs")
.workspace("tmp")
.tableName("nation")
.metadataKey(MetadataInfo.DEFAULT_SEGMENT_KEY)
.metadataType(MetadataType.SEGMENT.name())
...
.build();
TableMetadataUnit fileUnit = TableMetadataUnit.builder()
.storagePlugin("dfs")
.workspace("tmp")
.tableName("nation")
.metadataKey(MetadataInfo.DEFAULT_SEGMENT_KEY)
.metadataType(MetadataType.FILE.name())
...
.build();
```
To add this table metadata in the Metastore, the following code can be executed:
```
metastore.tables().modify()
.overwrite(tableUnit, segmentUnit)
.execute();
```
##### Overwriting table metadata
Metastore allows only to overwrite metadata by unique combination of table identifier
or table metadata identifiers (general info or top-level segments).
When only general table information has changed but segments default metadata did not change,
it is enough to overwrite only general information.
```
TableMetadataUnit updatedTableUnit = TableMetadataUnit.builder()
.storagePlugin("dfs")
.workspace("tmp")
.tableName("nation")
.metadataKey(MetadataInfo.GENERAL_INFO_KEY)
.metadataType(MetadataType.TABLE.name())
...
.build();
metastore.tables().modify()
.overwrite(updatedTableUnit)
.execute();
```
If segment metadata was changed, new file was added to the default segment,
all segment information must be overwritten.
```
TableMetadataUnit segmentUnit = TableMetadataUnit.builder()
.storagePlugin("dfs")
.workspace("tmp")
.tableName("nation")
.metadataKey(MetadataInfo.DEFAULT_SEGMENT_KEY)
.metadataType(MetadataType.SEGMENT.name())
...
.build();
TableMetadataUnit initialFileUnit = TableMetadataUnit.builder()
.storagePlugin("dfs")
.workspace("tmp")
.tableName("nation")
.metadataKey(MetadataInfo.DEFAULT_SEGMENT_KEY)
.metadataType(MetadataType.FILE.name())
...
.build();
TableMetadataUnit newFileUnit = TableMetadataUnit.builder()
.storagePlugin("dfs")
.workspace("tmp")
.tableName("nation")
.metadataKey(MetadataInfo.DEFAULT_SEGMENT_KEY)
.metadataType(MetadataType.FILE.name())
...
.build();
metastore.tables().modify()
.overwrite(segmentUnit, initialFileUnit, newFileUnit)
.execute();
```
#### Delete
`Read#delete` deletes data from the Metastore component based on the provided delete operation.
Delete operation consists of delete filter and metadata types.
Assume metadata for table `dfs.tmp.nation` already exists in the Metastore `tables` component
and caller needs to delete it and all its metadata. First, deletion filter must be created:
```
FilterExpression deleteFilter = FilterExpression.and(
FilterExpression.equal(MetastoreColumn.STORAGE_PLUGIN, "dfs"),
FilterExpression.equal(MetastoreColumn.WORKSPACE, "tmp"),
FilterExpression.equal(MetastoreColumn.TABLE_NAME, "nation"));
```
Such filter can be also generated using `TableInfo` class:
```
TableInfo tableInfo = TableInfo.builder()
.storagePlugin("dfs")
.workspace("tmp")
.name("nation")
.build();
FilterExpression deleteFilter = tableInfo.toFilter();
```
Then metadata type should be indicated, since all table metadata needs to be deleted,
`ALL` metadata type should be specified.
```
Delete deleteOperation = Delete.builder()
.metadataType(MetadataType.ALL)
.filter(deleteFilter)
.build()
```
Delete operation can be executed using the following code:
```
metastore.tables().modify()
.delete(deleteOperation)
.execute();
```
#### Purge
`Read#purge` deletes all data from the Metastore component. Purge is terminal operation
and cannot be used together with overwrite or delete.
```
metastore.tables().modify()
.purge();
```
#### Transactions
Metastore component implementation may or may not support transactions. If transactions are supported,
all operations in one `Modify` instance will be executed fully or not executed at all.
If Metastore implementation does not support transactions, all operations will be executed consequently.
Note: operations will be executed in the same order as they were added.
```
metastore.tables().modify()
.overwrite(tableUnit1, segmentUnit1)
.overwrite(tableUnit2, segmentUnit2)
.delete(deleteOperation1)
.delete(deleteOperation2)
.execute();
```