blob: 795fc533d29ea8e518e4a3e58154da4d1fa5c028 [file] [log] [blame] [view]
---
title: Python Serialization Guide
sidebar_position: 0
id: serialization_index
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
**Apache Fory™** is a blazing fast multi-language serialization framework powered by **JIT compilation** and **zero-copy** techniques, providing up to **ultra-fast performance** while maintaining ease of use and safety.
`pyfory` provides the Python implementation of Apache Fory™, offering both high-performance object serialization and advanced row-format capabilities for data processing tasks.
## Key Features
### Flexible Serialization Modes
- **Python native Mode**: Full Python compatibility, drop-in replacement for pickle/cloudpickle
- **Cross-Language Mode**: Optimized for multi-language data exchange
- **Row Format**: Zero-copy row format for analytics workloads
### Versatile Serialization Features
- **Shared/circular reference support** for complex object graphs in both Python-native and cross-language modes
- **Polymorphism support** for customized types with automatic type dispatching
- **Schema evolution** support for backward/forward compatibility when using dataclasses in cross-language mode
- **Out-of-band buffer support** for zero-copy serialization of large data structures like NumPy arrays and Pandas DataFrames, compatible with pickle protocol 5
### Blazing Fast Performance
- **Extremely fast performance** compared to other serialization frameworks
- **Runtime code generation** and **Cython-accelerated** core implementation for optimal performance
### Compact Data Size
- **Compact object graph protocol** with minimal space overheadup to 3× size reduction compared to pickle/cloudpickle
- **Meta packing and sharing** to minimize type forward/backward compatibility space overhead
### Security & Safety
- **Strict mode** prevents deserialization of untrusted types by type registration and checks.
- **Reference tracking** for handling circular references safely
## Installation
### Basic Installation
```bash
pip install pyfory
```
### Optional Dependencies
```bash
# Install with row format support (requires Apache Arrow)
pip install pyfory[format]
# Install from source for development
git clone https://github.com/apache/fory.git
cd fory/python
pip install -e ".[dev,format]"
```
### Requirements
- **Python**: 3.8 or higher
- **OS**: Linux, macOS, Windows
## Thread Safety
`pyfory` provides `ThreadSafeFory` for thread-safe serialization using thread-local storage:
```python
import pyfory
import threading
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
# Create thread-safe Fory instance
fory = pyfory.ThreadSafeFory(xlang=False, ref=True)
fory.register(Person)
# Use in multiple threads safely
def serialize_in_thread(thread_id):
person = Person(name=f"User{thread_id}", age=25 + thread_id)
data = fory.serialize(person)
result = fory.deserialize(data)
print(f"Thread {thread_id}: {result}")
threads = [threading.Thread(target=serialize_in_thread, args=(i,)) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()
```
**Key Features:**
- **Instance Pool**: Maintains a pool of `Fory` instances protected by a lock for thread safety
- **Shared Configuration**: All registrations must be done upfront and are applied to all instances
- **Same API**: Drop-in replacement for `Fory` class with identical methods
- **Registration Safety**: Prevents registration after first use to ensure consistency
**When to Use:**
- **Multi-threaded Applications**: Web servers, concurrent workers, parallel processing
- **Shared Fory Instances**: When multiple threads need to serialize/deserialize data
- **Thread Pools**: Applications using thread pools or concurrent.futures
## Quick Start
```python
import pyfory
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
# Create Fory instance
fory = pyfory.Fory(xlang=False, ref=True)
fory.register(Person)
person = Person("Alice", 30)
data = fory.serialize(person)
result = fory.deserialize(data)
print(result) # Person(name='Alice', age=30)
```
## Next Steps
- [Configuration](configuration.md) - Fory parameters and modes
- [Basic Serialization](basic-serialization.md) - Basic usage patterns
- [Python Native Mode](python-native.md) - Functions, lambdas, classes
- [Cross-Language](cross-language.md) - XLANG mode
- [Row Format](row-format.md) - Zero-copy row format
- [Security](security.md) - Security best practices
## Links
- **Documentation**: https://fory.apache.org/docs/latest/python_guide/
- **GitHub**: https://github.com/apache/fory
- **PyPI**: https://pypi.org/project/pyfory/
- **Issues**: https://github.com/apache/fory/issues
- **Slack**: https://join.slack.com/t/fory-project/shared_invite/zt-36g0qouzm-kcQSvV_dtfbtBKHRwT5gsw