| .. Licensed to the Apache Software Foundation (ASF) under one |
| .. or more contributor license agreements. See the NOTICE file |
| .. distributed with this work for additional information |
| .. regarding copyright ownership. The ASF licenses this file |
| .. to you under the Apache License, Version 2.0 (the |
| .. "License"); you may not use this file except in compliance |
| .. with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| .. software distributed under the License is distributed on an |
| .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| .. KIND, either express or implied. See the License for the |
| .. specific language governing permissions and limitations |
| .. under the License. |
| |
| .. Borrowed the file from Apache Paimon: |
| .. https://github.com/apache/paimon/blob/master/docs/content/concepts/spec/snapshot.md |
| |
| Snapshot |
| ======== |
| |
| Each commit generates a snapshot file, and the version of the snapshot file starts from 1 and must be continuous. |
| ``EARLIEST`` and ``LATEST`` are hint files at the beginning and end of the snapshot list, and they can be inaccurate. |
| When hint files are inaccurate, the reader will scan all snapshot files to determine the beginning and end. |
| |
| Directory Layout |
| ---------------- |
| |
| .. code-block:: shell |
| |
| warehouse |
| └── default.db |
| └── my_table |
| ├── snapshot |
| ├── EARLIEST |
| ├── LATEST |
| ├── snapshot-1 |
| ├── snapshot-2 |
| └── snapshot-3 |
| |
| Writing commit will preempt the next snapshot id, and once the snapshot file is successfully written, this commit will |
| become visible. |
| |
| Snapshot File |
| ------------- |
| |
| Snapshot file is JSON and includes: |
| |
| 1. ``version``: Snapshot file version, current is 3. |
| 2. ``id``: Snapshot id, same as the file name. |
| 3. ``schemaId``: The corresponding schema version for this commit. |
| 4. ``baseManifestList``: A manifest list recording all changes from the previous snapshots. |
| 5. ``deltaManifestList``: A manifest list recording all new changes occurred in this snapshot. |
| 6. ``changelogManifestList``: A manifest list recording all changelog produced in this snapshot; ``null`` if no changelog is produced. |
| 7. ``indexManifest``: A manifest recording all index files of this table; ``null`` if no table index file exists. |
| 8. ``commitUser``: Usually generated by UUID; used for recovery of streaming writes—one stream write job with one user. |
| 9. ``commitIdentifier``: Transaction id corresponding to streaming write; each transaction may result in multiple commits for different ``commitKind`` values. |
| 10. ``commitKind``: Type of changes in this snapshot, including ``append``, ``compact``, ``overwrite`` and ``analyze``. |
| 11. ``timeMillis``: Commit time in milliseconds. |
| 12. ``logOffsets``: Commit log offsets. |
| 13. ``totalRecordCount``: Record count of all changes occurred in this snapshot. |
| 14. ``deltaRecordCount``: Record count of all new changes occurred in this snapshot. |
| 15. ``changelogRecordCount``: Record count of all changelog produced in this snapshot. |
| 16. ``watermark``: Watermark for input records, from Flink watermark mechanism; ``Long.MIN_VALUE`` if there is no watermark. |
| 17. ``statistics``: Stats file name for statistics of this table. |