Schema Changes

Column IDs

Internal to a Schema, and not exposed to the user, each column in a schema has a unique identifier. The identifiers are integers which are not re-used, and serve to distinguish an old column from a new one in the case that they have the same name.

For example:

> CREATE TABLE x (col_a int, col_b int);
> INSERT INTO x VALUES (1, 1);
> ALTER TABLE x DROP COLUMN col_b;
> ALTER TABLE x ADD COLUMN col_b int not null default 999;

In this case, although the Schema at the end of the sequence looks the same as the one at the beginning, the correct data is:

> SELECT * from x;
 col_a   | col_b
------------------
  1      | 999

In other words, we cannot re-materialize data from the old col_b into the new col_b.

If we were to dump the initial schema and the new schema, we would see that although the two col_bs have the same name, they would have different column IDs.

Column IDs are internal to the server and not sent by the user on RPCs. Clients specify columns by name. This is because we expect a client to continue to make queries like “select sum(col_b) from x;” without any refresh of the schema, even if the column is dropped and re-added with new data.

Schemas specified in RPCs

When the user makes an RPC to read or write from a tablet, the RPC specifies only the names, types, and nullability of the columns. Internal to the server, we map the names to the internal IDs.

If the user specifies a column name which does not exist in the latest schema, it is considered an error.

If the type or nullability does not match, we also currently consider it an error. In the future, we may be able to adapt the data to the requested type (eg promote smaller to larger integers on read, promote non-null data to a nullable read, etc).

Handling varying schemas at read time

 + Tablet
 |---- MemRowSet
 |---- DiskRowSet N
 |-------- CFileSet
 |-------- Delta Tracker
 |------------ Delta Memstore
 |------------ Delta File N

Because the Schema of a table may change over time, different rowsets may have been written with different schemas. At read time, the server determines a Schema for the read based on the current metadata of the tablet. This Schema determines what to do as the read path encounters older data which was inserted prior to the schema change and thus may be missing some columns.

For each column in the read schema which is not present in the data, that column may be treated in one of two ways:

In the case that the new column has a “read default” in the metadata, that value is materialized for each cell.
If no “read default” is present, then the column must be nullable. In that case, a column of NULLs is materialized.

Currently, Kudu does not handle type changes. In the future, we may also need to add type adapters to convert older data to the new type.

When reading delta files, updates to columns which have since been removed are ignored. Updates to new columns are applied on top of the materialized default column data.

Compaction

Each CFileSet and DeltaFile has a schema associated to describe the data in it. On compaction, CFileSet/DeltaFiles with different schemas may be aggregated into a new file. This new file will have the latest schema and all the rows must be projected.

In the case of CFiles, the projection affects only the new columns, where the read default value will be written as data, or in case of “alter type” where the “encoding” is changed.

In the case of DeltaFiles, the projection is essential since the RowChangeList is serialized with no hint of the schema used. This means that you can read a RowChangeList only if you know the exact serialization schema.