commit	62266b5919f6bc5b543f9cfed64d7c1bf4818558	[log] [tgz]
author	Jacob Quinn <quinn.jacobd@gmail.com>	Sat Jan 30 23:18:44 2021 -0700
committer	GitHub <noreply@github.com>	Sat Jan 30 23:18:44 2021 -0700
tree	e459e430527fffbb38fd3aa2d7cb7a29c5c63a4d
parent	da73d8201a8915325800dfda9a099539be096e7d [diff]

tree: e459e430527fffbb38fd3aa2d7cb7a29c5c63a4d

README.md

Arrow

This is a pure Julia implementation of the Apache Arrow data standard. This package provides Julia AbstractVector objects for referencing data that conforms to the Arrow standard. This allows users to seamlessly interface Arrow formatted data with a great deal of existing Julia code.

Please see this document for a description of the Arrow memory layout.

Installation

The package can be installed by typing in the following in a Julia REPL:

julia> using Pkg; Pkg.add("Arrow")

or to use the official-apache code that follows the official apache release process, you can do:

julia> using Pkg; Pkg.add(url="https://github.com/apache/arrow", subdir="julia/Arrow.jl")

Difference between this code and the apache/arrow/julia/Arrow repository

The code in the apache/arrow repository is officially part of the apache/arrow project and as such follows the regulated release cadence of the entire project, following standard community voting protocols. The JuliaData/Arrow.jl repository can be viewed as a sort of “dev” or “latest” branch of this code that may release more frequently, but without following official apache release guidelines. The two repositories are synced, however, so any bugfix patches in JuliaData will be upstreamed to apache/arrow for each release.

Format Support

This implementation supports the 1.0 version of the specification, including support for:

All primitive data types
All nested data types
Dictionary encodings and messages
Extension types
Streaming, file, record batch, and replacement and isdelta dictionary messages

It currently doesn't include support for:

Tensors or sparse tensors
Flight RPC
C data interface

Third-party data formats:

csv and parquet support via the existing CSV.jl and Parquet.jl packages
Other Tables.jl-compatible packages automatically supported (DataFrames.jl, JSONTables.jl, JuliaDB.jl, SQLite.jl, MySQL.jl, JDBC.jl, ODBC.jl, XLSX.jl, etc.)
No current Julia packages support ORC or Avro data formats

See the full documentation for details on reading and writing arrow data.