blob: 659e1f04d346ffeeb303c3c81e2f9f77f1e09a56 [file] [log] [blame] [view]
---
title: "Overview"
weight: 1
type: docs
aliases:
- /concepts/overview.html
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Overview
Apache Paimon's Architecture:
{{< img src="/img/architecture.png">}}
As shown in the architecture above:
**Read/Write:** Paimon supports a versatile way to read/write data and perform OLAP queries.
- For reads, it supports consuming data
- from historical snapshots (in batch mode),
- from the latest offset (in streaming mode), or
- reading incremental snapshots in a hybrid way.
- For writes, it supports
- streaming synchronization from the changelog of databases (CDC)
- batch insert/overwrite from offline data.
**Ecosystem:** In addition to Apache Flink, Paimon also supports read by other computation
engines like Apache Spark, StarRocks, Apache Doris, Apache Hive and Trino.
**Internal:**
- Under the hood, Paimon stores the columnar files on the filesystem/object-store
- The metadata of the file is saved in the manifest file, providing large-scale storage and data skipping.
- For primary key table, uses the LSM tree structure to support a large volume of data updates and high-performance queries.
## Unified Storage
For streaming engines like Apache Flink, there are typically three types of connectors:
- Message queue, such as Apache Kafka, it is used in both source and
intermediate stages in this pipeline, to guarantee the latency stay
within seconds.
- OLAP system, such as ClickHouse, it receives processed data in
streaming fashion and serving user’s ad-hoc queries.
- Batch storage, such as Apache Hive, it supports various operations
of the traditional batch processing, including `INSERT OVERWRITE`.
Paimon provides table abstraction. It is used in a way that
does not differ from the traditional database:
- In `batch` execution mode, it acts like a Hive table and
supports various operations of Batch SQL. Query it to see the
latest snapshot.
- In `streaming` execution mode, it acts like a message queue.
Query it acts like querying a stream changelog from a message queue
where historical data never expires.