blob: 8523d29f3f039cdc1e1925a147c8f091251758ca [file] [log] [blame] [view]
---
title: "Basic Concepts"
weight: 2
type: docs
aliases:
- /concepts/basic-concepts.html
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Basic Concepts
## File Layouts
All files of a table are stored under one base directory. Paimon files are organized in a layered style. The following image illustrates the file layout. Starting from a snapshot file, Paimon readers can recursively access all records from the table.
{{< img src="/img/file-layout.png">}}
## Snapshot
All snapshot files are stored in the `snapshot` directory.
A snapshot file is a JSON file containing information about this snapshot, including
* the schema file in use
* the manifest list containing all changes of this snapshot
A snapshot captures the state of a table at some point in time. Users can access the latest data of a table through the
latest snapshot. By time traveling, users can also access the previous state of a table through an earlier snapshot.
## Manifest Files
All manifest lists and manifest files are stored in the `manifest` directory.
A manifest list is a list of manifest file names.
A manifest file is a file containing changes about LSM data files and changelog files. For example, which LSM data file is created and which file is deleted in the corresponding snapshot.
## Data Files
Data files are grouped by partitions. Currently, Paimon supports using parquet (default), orc and avro as data file's format.
## Partition
Paimon adopts the same partitioning concept as Apache Hive to separate data.
Partitioning is an optional way of dividing a table into related parts based on the values of particular columns like date, city, and department. Each table can have one or more partition keys to identify a particular partition.
By partitioning, users can efficiently operate on a slice of records in the table.
## Consistency Guarantees
Paimon writers use two-phase commit protocol to atomically commit a batch of records to the table. Each commit produces
at most two [snapshots]({{< ref "concepts/basic-concepts#snapshot" >}}) at commit time. It depends on the incremental write and compaction strategy. If only incremental writes are performed without triggering a compaction operation, only an incremental snapshot will be created. If a compaction operation is triggered, an incremental snapshot and a compacted snapshot will be created.
For any two writers modifying a table at the same time, as long as they do not modify the same partition, their commits
can occur in parallel. If they modify the same partition, only snapshot isolation is guaranteed. That is, the final table
state may be a mix of the two commits, but no changes are lost.
See [dedicated compaction job]({{< ref "maintenance/dedicated-compaction#dedicated-compaction-job" >}}) for more info.