Official Julia implementation of Apache Arrow

Clone this repo:
  1. 64fc730 Compatibility of schemas with nested types (#504) by Romain Poncet · 4 days ago main v2.7.2 v2.7.2-rc1
  2. ac199b0 Bugfix type instability in length(::Arrow.ToList) (#497) by Joao Aparicio · 3 months ago v2.7.1 v2.7.1-rc1
  3. 3712291 enable field-order-agnostic overloads of `fromarrow` for struct types (#493) by Jarrett Revels · 5 months ago v2.7.0 v2.7.0-rc1
  4. 787768f Docs: add a note about using `copy()` to get a `DataFrame` where the columns are regular vectors (#487) by Dilum Aluthge · 7 months ago
  5. 953cbac Use https://arrow.apache.org/julia/ as the official document URL (#490) by Sutou Kouhei · 7 months ago

Arrow

docs CI codecov

deps version pkgeval

This is a pure Julia implementation of the Apache Arrow data standard. This package provides Julia AbstractVector objects for referencing data that conforms to the Arrow standard. This allows users to seamlessly interface Arrow formatted data with a great deal of existing Julia code.

Please see this document for a description of the Arrow memory layout.

Installation

The package can be installed by typing in the following in a Julia REPL:

julia> using Pkg; Pkg.add("Arrow")

Local Development

When developing on Arrow.jl it is recommended that you run the following to ensure that any changes to ArrowTypes.jl are immediately available to Arrow.jl without requiring a release:

julia --project -e 'using Pkg; Pkg.develop(path="src/ArrowTypes")'

Format Support

This implementation supports the 1.0 version of the specification, including support for:

  • All primitive data types
  • All nested data types
  • Dictionary encodings and messages
  • Extension types
  • Streaming, file, record batch, and replacement and isdelta dictionary messages

It currently doesn't include support for:

  • Tensors or sparse tensors
  • Flight RPC
  • C data interface

Third-party data formats:

See the full documentation for details on reading and writing arrow data.