docs/content/concepts/basic-concepts.md - paimon - Git at Google

 ---
 title: "Basic Concepts"
 weight: 2
 type: docs
 aliases:
 - /concepts/basic-concepts.html
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 # Basic Concepts

 ## File Layouts

 All files of a table are stored under one base directory. Paimon files are organized in a layered style. The following image illustrates the file layout. Starting from a snapshot file, Paimon readers can recursively access all records from the table.

 {{< img src="/img/file-layout.png">}}

 ## Snapshot

 All snapshot files are stored in the `snapshot` directory.

 A snapshot file is a JSON file containing information about this snapshot, including

 * the schema file in use
 * the manifest list containing all changes of this snapshot

 A snapshot captures the state of a table at some point in time. Users can access the latest data of a table through the
 latest snapshot. By time traveling, users can also access the previous state of a table through an earlier snapshot.

 ## Manifest Files

 All manifest lists and manifest files are stored in the `manifest` directory.

 A manifest list is a list of manifest file names.

 A manifest file is a file containing changes about LSM data files and changelog files. For example, which LSM data file is created and which file is deleted in the corresponding snapshot.

 ## Data Files

 Data files are grouped by partitions. Currently, Paimon supports using parquet (default), orc and avro as data file's format.

 ## Partition

 Paimon adopts the same partitioning concept as Apache Hive to separate data.

 Partitioning is an optional way of dividing a table into related parts based on the values of particular columns like date, city, and department. Each table can have one or more partition keys to identify a particular partition.

 By partitioning, users can efficiently operate on a slice of records in the table.

 ## Consistency Guarantees

 Paimon writers use two-phase commit protocol to atomically commit a batch of records to the table. Each commit produces
 at most two [snapshots]({{< ref "concepts/basic-concepts#snapshot" >}}) at commit time. It depends on the incremental write and compaction strategy. If only incremental writes are performed without triggering a compaction operation, only an incremental snapshot will be created. If a compaction operation is triggered, an incremental snapshot and a compacted snapshot will be created.

 For any two writers modifying a table at the same time, as long as they do not modify the same partition, their commits
 can occur in parallel. If they modify the same partition, only snapshot isolation is guaranteed. That is, the final table
 state may be a mix of the two commits, but no changes are lost.
 See [dedicated compaction job]({{< ref "maintenance/dedicated-compaction#dedicated-compaction-job" >}}) for more info.
	---
	title: "Basic Concepts"
	weight: 2
	type: docs
	aliases:
	- /concepts/basic-concepts.html
	---
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# Basic Concepts

	## File Layouts

	All files of a table are stored under one base directory. Paimon files are organized in a layered style. The following image illustrates the file layout. Starting from a snapshot file, Paimon readers can recursively access all records from the table.

	{{< img src="/img/file-layout.png">}}

	## Snapshot

	All snapshot files are stored in the `snapshot` directory.

	A snapshot file is a JSON file containing information about this snapshot, including

	* the schema file in use
	* the manifest list containing all changes of this snapshot

	A snapshot captures the state of a table at some point in time. Users can access the latest data of a table through the
	latest snapshot. By time traveling, users can also access the previous state of a table through an earlier snapshot.

	## Manifest Files

	All manifest lists and manifest files are stored in the `manifest` directory.

	A manifest list is a list of manifest file names.

	A manifest file is a file containing changes about LSM data files and changelog files. For example, which LSM data file is created and which file is deleted in the corresponding snapshot.

	## Data Files

	Data files are grouped by partitions. Currently, Paimon supports using parquet (default), orc and avro as data file's format.

	## Partition

	Paimon adopts the same partitioning concept as Apache Hive to separate data.

	Partitioning is an optional way of dividing a table into related parts based on the values of particular columns like date, city, and department. Each table can have one or more partition keys to identify a particular partition.

	By partitioning, users can efficiently operate on a slice of records in the table.

	## Consistency Guarantees

	Paimon writers use two-phase commit protocol to atomically commit a batch of records to the table. Each commit produces
	at most two [snapshots]({{< ref "concepts/basic-concepts#snapshot" >}}) at commit time. It depends on the incremental write and compaction strategy. If only incremental writes are performed without triggering a compaction operation, only an incremental snapshot will be created. If a compaction operation is triggered, an incremental snapshot and a compacted snapshot will be created.

	For any two writers modifying a table at the same time, as long as they do not modify the same partition, their commits
	can occur in parallel. If they modify the same partition, only snapshot isolation is guaranteed. That is, the final table
	state may be a mix of the two commits, but no changes are lost.
	See [dedicated compaction job]({{< ref "maintenance/dedicated-compaction#dedicated-compaction-job" >}}) for more info.