blob: 6bbe1754d9a5c75e5c328c443ef8b6ebe008582e [file] [log] [blame]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
============================================================
Schema Changes
============================================================
Column IDs
------------------------------
Internal to a Schema, and not exposed to the user, each column in a schema has
a unique identifier. The identifiers are integers which are not re-used,
and serve to distinguish an old column from a new one in the case that they
have the same name.
For example:
> CREATE TABLE x (col_a int, col_b int);
> INSERT INTO x VALUES (1, 1);
> ALTER TABLE x DROP COLUMN col_b;
> ALTER TABLE x ADD COLUMN col_b int not null default 999;
In this case, although the Schema at the end of the sequence looks the same
as the one at the beginning, the correct data is:
> SELECT * from x;
col_a | col_b
------------------
1 | 999
In other words, we cannot re-materialize data from the old 'col_b' into the new
'col_b'.
If we were to dump the initial schema and the new schema, we would see that although
the two 'col_b's have the same name, they would have different column IDs.
Column IDs are internal to the server and not sent by the user on RPCs. Clients
specify columns by name. This is because we expect a client to continue to make
queries like "select sum(col_b) from x;" without any refresh of the schema, even
if the column is dropped and re-added with new data.
Schemas specified in RPCs
------------------------------
When the user makes an RPC to read or write from a tablet, the RPC specifies only
the names, types, and nullability of the columns. Internal to the server, we map
the names to the internal IDs.
If the user specifies a column name which does not exist in the latest schema,
it is considered an error.
If the type or nullability does not match, we also currently consider it an error.
In the future, we may be able to adapt the data to the requested type (eg promote
smaller to larger integers on read, promote non-null data to a nullable read, etc).
Handling varying schemas at read time
------------------------------
+ Tablet
|---- MemRowSet
|---- DiskRowSet N
|-------- CFileSet
|-------- Delta Tracker
|------------ Delta Memstore
|------------ Delta File N
Because the Schema of a table may change over time, different rowsets may have
been written with different schemas. At read time, the server determines a Schema
for the read based on the current metadata of the tablet. This Schema determines
what to do as the read path encounters older data which was inserted prior to
the schema change and thus may be missing some columns.
For each column in the read schema which is not present in the data, that column
may be treated in one of two ways:
1) In the case that the new column has a "read default" in the metadata, that
value is materialized for each cell.
2) If no "read default" is present, then the column must be nullable. In that
case, a column of NULLs is materialized.
Currently, Kudu does not handle type changes. In the future, we may also need to
add type adapters to convert older data to the new type.
When reading delta files, updates to columns which have since been removed are
ignored. Updates to new columns are applied on top of the materialized default
column data.
Compaction
------------------------------
Each CFileSet and DeltaFile has a schema associated to describe the data in it.
On compaction, CFileSet/DeltaFiles with different schemas may be aggregated into a new file.
This new file will have the latest schema and all the rows must be projected.
In the case of CFiles, the projection affects only the new columns, where the read default
value will be written as data, or in case of "alter type" where the "encoding" is changed.
In the case of DeltaFiles, the projection is essential since the RowChangeList is serialized
with no hint of the schema used. This means that you can read a RowChangeList only if you
know the exact serialization schema.