docs/content/concepts/overview.md - paimon - Git at Google

 ---
 title: "Overview"
 weight: 1
 type: docs
 aliases:
 - /concepts/overview.html
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 # Overview

 Apache Paimon's Architecture:

 {{< img src="/img/architecture.png">}}

 As shown in the architecture above:

 **Read/Write:** Paimon supports a versatile way to read/write data and perform OLAP queries.
 - For reads, it supports consuming data
   - from historical snapshots (in batch mode),
   - from the latest offset (in streaming mode), or
   - reading incremental snapshots in a hybrid way.
 - For writes, it supports
   - streaming synchronization from the changelog of databases (CDC)
   - batch insert/overwrite from offline data.

 **Ecosystem:** In addition to Apache Flink, Paimon also supports read by other computation
 engines like Apache Spark, StarRocks, Apache Doris, Apache Hive and Trino.

 **Internal:**
 - Under the hood, Paimon stores the columnar files on the filesystem/object-store
 - The metadata of the file is saved in the manifest file, providing large-scale storage and data skipping.
 - For primary key table, uses the LSM tree structure to support a large volume of data updates and high-performance queries.

 ## Unified Storage

 For streaming engines like Apache Flink, there are typically three types of connectors:
 - Message queue, such as Apache Kafka, it is used in both source and
   intermediate stages in this pipeline, to guarantee the latency stay
   within seconds.
 - OLAP system, such as ClickHouse, it receives processed data in
   streaming fashion and serving user’s ad-hoc queries.
 - Batch storage, such as Apache Hive, it supports various operations
   of the traditional batch processing, including `INSERT OVERWRITE`.

 Paimon provides table abstraction. It is used in a way that
 does not differ from the traditional database:
 - In `batch` execution mode, it acts like a Hive table and
   supports various operations of Batch SQL. Query it to see the
   latest snapshot.
 - In `streaming` execution mode, it acts like a message queue.
   Query it acts like querying a stream changelog from a message queue
   where historical data never expires.
	---
	title: "Overview"
	weight: 1
	type: docs
	aliases:
	- /concepts/overview.html
	---
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# Overview

	Apache Paimon's Architecture:

	{{< img src="/img/architecture.png">}}

	As shown in the architecture above:

	Read/Write: Paimon supports a versatile way to read/write data and perform OLAP queries.
	- For reads, it supports consuming data
	- from historical snapshots (in batch mode),
	- from the latest offset (in streaming mode), or
	- reading incremental snapshots in a hybrid way.
	- For writes, it supports
	- streaming synchronization from the changelog of databases (CDC)
	- batch insert/overwrite from offline data.

	Ecosystem: In addition to Apache Flink, Paimon also supports read by other computation
	engines like Apache Spark, StarRocks, Apache Doris, Apache Hive and Trino.

	Internal:
	- Under the hood, Paimon stores the columnar files on the filesystem/object-store
	- The metadata of the file is saved in the manifest file, providing large-scale storage and data skipping.
	- For primary key table, uses the LSM tree structure to support a large volume of data updates and high-performance queries.

	## Unified Storage

	For streaming engines like Apache Flink, there are typically three types of connectors:
	- Message queue, such as Apache Kafka, it is used in both source and
	intermediate stages in this pipeline, to guarantee the latency stay
	within seconds.
	- OLAP system, such as ClickHouse, it receives processed data in
	streaming fashion and serving user’s ad-hoc queries.
	- Batch storage, such as Apache Hive, it supports various operations
	of the traditional batch processing, including `INSERT OVERWRITE`.

	Paimon provides table abstraction. It is used in a way that
	does not differ from the traditional database:
	- In `batch` execution mode, it acts like a Hive table and
	supports various operations of Batch SQL. Query it to see the
	latest snapshot.
	- In `streaming` execution mode, it acts like a message queue.
	Query it acts like querying a stream changelog from a message queue
	where historical data never expires.