commit | 2503375bedb7750d80c8df9e9b99dfe02fbaeba7 | [log] [tgz] |
---|---|---|
author | Jan Jahoda <aik.jahoda@post.cz> | Wed Sep 24 14:37:59 2025 +0200 |
committer | GitHub <noreply@github.com> | Wed Sep 24 05:37:59 2025 -0700 |
tree | a0b31e5b6b62d9b54b013d78855aad5a382be64e | |
parent | fbbe78e79bd0b231bd3fc2fd085a9d803f0e45a7 [diff] |
Remove unnecessary allocation in ArrowStreamWriter (#73) ## What's Changed ArrowStreamWriter allocates 8k of memory by creating a new array pool. This change introduces a shared buffer instead of pool to reduce the allocations. The array pool is used to rent small arrays (8 bytes) but the pool allocates much bigger arrays (8kb in total) The Array pool has access time overhead for small arrays compared to direct allocation. Results from benchmarks: Old implementation: | Method | BatchLength | ColumnSetCount | Mean | Error | StdDev | Allocated | |----------- |------------ |--------------- |-----------:|----------:|----------:|----------:| | WriteBatch | 10000 | 10 | 6.118 ms | 0.1215 ms | 0.3345 ms | 248.53 KB | | WriteBatch | 10000 | 14 | 9.788 ms | 0.1910 ms | 0.3396 ms | 324.12 KB | | WriteBatch | 300000 | 10 | 119.351 ms | 3.1713 ms | 9.3008 ms | 248.53 KB | | WriteBatch | 300000 | 14 | 136.697 ms | 2.9229 ms | 8.4799 ms | 324.12 KB | New implementation: | Method | BatchLength | ColumnSetCount | Mean | Error | StdDev | Median | Allocated | |----------- |------------ |--------------- |-----------:|----------:|-----------:|-----------:|----------:| | WriteBatch | 10000 | 10 | 5.925 ms | 0.2057 ms | 0.6001 ms | 5.843 ms | 240.64 KB | | WriteBatch | 10000 | 14 | 8.908 ms | 0.2743 ms | 0.8002 ms | 8.778 ms | 316.23 KB | | WriteBatch | 300000 | 10 | 94.835 ms | 1.7872 ms | 3.7699 ms | 93.892 ms | 240.64 KB | | WriteBatch | 300000 | 14 | 147.995 ms | 3.6873 ms | 10.6975 ms | 144.591 ms | 316.23 KB | Closes #41.
An implementation of Arrow targeting .NET Standard.
See our current feature matrix for currently available features.
using System.Diagnostics; using System.IO; using System.Threading.Tasks; using Apache.Arrow; using Apache.Arrow.Ipc; public static async Task<RecordBatch> ReadArrowAsync(string filename) { using (var stream = File.OpenRead(filename)) using (var reader = new ArrowFileReader(stream)) { var recordBatch = await reader.ReadNextRecordBatchAsync(); Debug.WriteLine("Read record batch with {0} column(s)", recordBatch.ColumnCount); return recordBatch; } }
Apache.Arrow.Compression
package. When reading compressed data, you must pass an Apache.Arrow.Compression.CompressionCodecFactory
instance to the ArrowFileReader
or ArrowStreamReader
constructor, and when writing compressed data a CompressionCodecFactory
must be set in the IpcOptions
. Alternatively, a custom implementation of ICompressionCodecFactory
can be used.Install the latest .NET Core SDK
from https://dotnet.microsoft.com/download.
dotnet build
To build the NuGet package run the following command to build a debug flavor, preview package into the artifacts folder.
dotnet pack
When building the officially released version run: (see Note below about current git
repository)
dotnet pack -c Release
Which will build the final/stable package.
NOTE: When building the officially released version, ensure that your git
repository has the origin
remote set to https://github.com/apache/arrow.git
, which will ensure Source Link is set correctly. See https://github.com/dotnet/sourcelink/blob/main/docs/README.md for more information.
There are two output artifacts:
Apache.Arrow.<version>.nupkg
- this contains the executable assembliesApache.Arrow.<version>.snupkg
- this contains the debug symbols filesBoth of these artifacts can then be uploaded to https://www.nuget.org/packages/manage/upload.
Build from the Apache Arrow project root.
docker build -f csharp/build/docker/Dockerfile .
dotnet test
All build artifacts are placed in the artifacts folder in the project root.
This project follows the coding style specified in Coding Style.
See https://google.github.io/flatbuffers/flatbuffers_guide_use_java_c-sharp.html for how to get the flatc
executable.
Run flatc --csharp
on each .fbs
file in the format folder. And replace the checked in .cs
files under FlatBuf with the generated files.
Update the non-generated FlatBuffers .cs
files with the files from the google/flatbuffers repo.