blob: 744578a09e0d58783ee4685e4cd42229a360ada4 [file] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# DEV README
## Overview
Before reading this document, please make sure you have read the README.md.
In this document, we will introduce the architecture, module design, implementation details, and considerations for developers of PAX.
## Architecture
The directory structure:
```
/
├── src/
│ ├── api/ # API for the other modules
│ ├── cpp/ # Core source
│ │ ├── access/ # Table access layer
│ │ ├── catalog # Catalog layer
│ │ ├── clustering/ # Cluster
│ │ ├── cmake/ # CMake files in subdirectory
│ │ ├── comm/ # Tools or global
│ │ ├── contrib/ # Third-party libraries
│ │ ├── manifest/ # The manifest catalog implementation
│ │ ├── storage/ # Main storage implementation
│ │ │ ├── columns/ # Column layer
│ │ │ ├── filter/ # Column projection && sparse filter && row filter
│ │ │ ├── oper/ # Operator implementation
│ │ │ ├── orc/ # Storage format implementation
│ │ │ ├── proto/ # Protobuf defined
│ │ │ ├── toast/ # Implementation of toast
│ │ │ ├── tools/ # Implementation of utility functions
│ │ │ ├── vec/ # Implementation of interface of vectorized executor
│ │ │ ├── wal/ # Implementation of interface of WAL log
│ │ │ └── ... # The interface of microparition layer
│ │ └── ... # The interface of different layer
│ └── test/ # Regression tests
├── sql/ # SQL tests
├── expected/ # The expected files for the SQL tests
├── hd-ci # CI related
├── tools/ # Tools for build or code inspection
└── ... # Build script && test related && README
```
### Layer
![pax-struct](res/pax-struct.png)
- Access Handler Layer: Top layer. The implementation of the access method.
- Table Layer: For table-level read/write/update/delete
- Micro-Partition Layer: For the single file-level and group-level read/write/update/delete. Statistics and filtering are also implemented at this layer.
- Column Layer: In-memory abstraction of columns
- File Layer:In-memory abstraction of file systems
## Meta data
PAX has two implementations for the catalog:
- Auxiliary table: Use HEAP to implements the auxiliary tables. HEAP will provide MVCC.
- Manifest: Use JSON to store metadata, independent MVCC implementation. Better readability, but DDL often leads to poor performance.
Users need to specify which catalog type during compilation.
More details in `README.catalog.md`.
## Data
PAX has two storage formats:
- `PORC`: The storage format is suitable for Cloudberry executors. No format conversion is required for writing or reading by Cloudberry executors. However, format conversion is required when reading by vectorized executors.
- `PORC_VEC`: The storage format is suitable for vectorized executors. Data format conversion is required during writing, and the format needs to be converted when the Cloudberry executor reads. However, the format conversion is not required when the vectorized executor reads.
More details in `README.format.md`.