_layout: landing
Apache Arrow .NET
An implementation of Arrow targeting .NET.
See our current feature matrix for currently available features.
Implementation
- Arrow specification 1.0.0. (Support for reading 0.11+.)
- C# 11
- .NET Standard 2.0, .NET 6.0, .NET 8.0 and .NET Framework 4.6.2
- Asynchronous I/O
- Uses modern .NET runtime features such as Span<T>, Memory<T>, MemoryManager<T>, and System.Buffers primitives for memory allocation, memory storage, and fast serialization.
- Uses Acyclic Visitor Pattern for array types and arrays to facilitate serialization, record batch traversal, and format growth.
Known Issues
- Cannot read Arrow files containing tensors.
- Cannot easily modify allocation strategy without implementing a custom memory pool. All allocations are currently 64-byte aligned and padded to 8-bytes.
- Default memory allocation strategy uses an over-allocation strategy with pointer fixing, which results in significant memory overhead for small buffers. A buffer that requires a single byte for storage may be backed by an allocation of up to 64-bytes to satisfy alignment requirements.
- There are currently few builder APIs available for specific array types. Arrays must be built manually with an arrow buffer builder abstraction.
- FlatBuffer code generation is not included in the build process.
- Serialization implementation does not perform exhaustive validation checks during deserialization in every scenario.
- Throws exceptions with vague, inconsistent, or non-localized messages in many situations
- Throws exceptions that are non-specific to the Arrow implementation in some circumstances where it probably should (eg. does not throw ArrowException exceptions)
- Lack of code documentation
- Lack of usage examples
Usage
Example demonstrating reading RecordBatches from an Arrow IPC file using an ArrowFileReader:
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;
using Apache.Arrow;
using Apache.Arrow.Ipc;
public static async Task<RecordBatch> ReadArrowAsync(string filename)
{
using (var stream = File.OpenRead(filename))
using (var reader = new ArrowFileReader(stream))
{
var recordBatch = await reader.ReadNextRecordBatchAsync();
Debug.WriteLine("Read record batch with {0} column(s)", recordBatch.ColumnCount);
return recordBatch;
}
}
Status
Memory Management
- Allocations are 64-byte aligned and padded to 8-bytes.
- Allocations are automatically garbage collected
Arrays
Primitive Types
Parametric Types
Type Metadata
Serialization
IPC Format
Compression
Not Implemented
- Serialization
- Exhaustive validation
- Run End Encoding
- Types
- Arrays
- Large Arrays. There are large array types provided to help with interoperability with other libraries, but these do not support buffers larger than 2 GiB and an exception will be raised if trying to import an array that is too large.
- Views
- Array Operations
- Equality / Comparison
- Casting
- Compute
- There is currently no API available for a compute / kernel abstraction.