commit	125921d582fa03b60e1e8ca42dc81a0a75f80e35	[log] [tgz]
author	Jacob Quinn <quinn.jacobd@gmail.com>	Fri Oct 02 16:41:29 2020 -0600
committer	Jacob Quinn <quinn.jacobd@gmail.com>	Fri Oct 02 16:41:29 2020 -0600
tree	cfbcea90931d89a74bbc8ff4e78e4f22d46fd72d
parent	ed66476434888bd9b49ed29d73ef5d2ea9cc7114 [diff]

commit

125921d582fa03b60e1e8ca42dc81a0a75f80e35

[log] [tgz]

author

Jacob Quinn <quinn.jacobd@gmail.com>

Fri Oct 02 16:41:29 2020 -0600

committer

Jacob Quinn <quinn.jacobd@gmail.com>

Fri Oct 02 16:41:29 2020 -0600

tree

cfbcea90931d89a74bbc8ff4e78e4f22d46fd72d

parent

ed66476434888bd9b49ed29d73ef5d2ea9cc7114 [diff]

Various fixes and mechanics to test round-tripping w/ pyarrow This doesn't actually hook up pyarrow roundtrip testing, but you can run the pyarrowrountrip.jl test file if you have python3 and pyarrow installed locally (along with PyCall.jl on the julia side). It then tests most of our testtables.jl testing tables by writing them in julia, passing written bytes to pyarrow, reading them via pyarrow, writing them back out, then reading in on julia side. The fixes were pretty minor, but feels much better knowing all these exmaples work well (and will be easy to test in the future).

tree: cfbcea90931d89a74bbc8ff4e78e4f22d46fd72d

README.md

Arrow

This is a pure Julia implementation of the Apache Arrow data standard. This package provides Julia AbstractVector objects for referencing data that conforms to the Arrow standard. This allows users to seamlessly interface Arrow formatted data with a great deal of existing Julia code.

Please see this document for a description of the Arrow memory layout.

Basic usage:

Installation

] add Tables#master
] add https://github.com/JuliaData/Arrow.jl#master

Reading

Read from IO, file, or byte vector directly. Arrow data can be in file or streaming format, Arrow.Table will detect automatically.

using Arrow

# read arrow table from file format
tbl = Arrow.Table(file)

# read arrow table from IO
tbl = Arrow.Table(io)

# read arrow table directly from bytes, like from an HTTP request
resp = HTTP.get(url)
tbl = Arrow.Table(resp.body)

Writing

Write any Tables.jl source as arrow formatted data. Can write directly to IO or to a provided file name.

# write directly to any IO in streaming format
Arrow.write(io, tbl)

# write to a file in file format
Arrow.write("data.arrow", tbl)