tag	2d84a044fd97a222f7d5ee9630565f54c3cf3132
tagger	github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	Mon Jun 05 21:29:55 2023 +0000
object	771db0a31685e6b12e8b576685b7d8d5c573b855

tag

2d84a044fd97a222f7d5ee9630565f54c3cf3132

tagger

github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Mon Jun 05 21:29:55 2023 +0000

object

771db0a31685e6b12e8b576685b7d8d5c573b855

[Diff since v2.5.2](https://github.com/apache/arrow-julia/compare/v2.5.2...v2.6.0) **Closed issues:** - Support Tables.partitions when reading "arrow file" format in addition to "stream" format (#293) - Make recursive iteration to get dictionaries more defensive for interop (#375) - Error/Segfault when writing many partitions (#396) - `Vector{UInt8}` mis-represented when writing to disk (#411) - CI doesn't test with multiple threads (#426) - Malformed file by `Arrow.write` on a `IOStream` created with `open(filename, "w")` (#432) - Unhandled sentinel value for len in compression causes invalid Array dimensions (#435) **Merged pull requests:** - Get dictionaries of children only when field.children not nothing (#382) (@okartal) - fix Base.eltype methods and functions that take Type parameters (#404) (@baumgold) - enable incremental reads of arrow-formatted files (#408) (@baumgold) - Base.isdone for Stream (#428) (@baumgold) - Run with 1 and 2 threads during tests (#431) (@quinnj) - Add handling of len = -1 in uncompress (#436) (@DrChainsaw) - Don't treat Vector{UInt8} as Arrow Binary type (#439) (@quinnj) - Bump BitIntegers compat (#441) (@quinnj) - Handle len of -1 in "compressed" buffers from other languages (#442) (@quinnj) - Add Tables.partitions definition for Arrow.Table (#443) (@quinnj) - Remove scopedenum for EnumX (#444) (@quinnj) - Refactor compressors/decompressors for laziness + safety (#445) (@quinnj) - Return SubArrays when possible for arrow list types (#446) (@quinnj) - bump version of Arrow and ArrowTypes to prepare for new release (#447) (@baumgold)

commit	771db0a31685e6b12e8b576685b7d8d5c573b855	[log] [tgz]
author	Ben Baumgold <4933671+baumgold@users.noreply.github.com>	Fri Jun 02 16:35:32 2023 -0400
committer	GitHub <noreply@github.com>	Fri Jun 02 14:35:32 2023 -0600
tree	340e01a7e117d31c618bcc9465b44255c4e1c5ba
parent	784b27bc81ea227f5556106d948db42d79b84fcb [diff]

commit

771db0a31685e6b12e8b576685b7d8d5c573b855

[log] [tgz]

author

Ben Baumgold <4933671+baumgold@users.noreply.github.com>

Fri Jun 02 16:35:32 2023 -0400

committer

GitHub <noreply@github.com>

Fri Jun 02 14:35:32 2023 -0600

tree

340e01a7e117d31c618bcc9465b44255c4e1c5ba

parent

784b27bc81ea227f5556106d948db42d79b84fcb [diff]

tree: 340e01a7e117d31c618bcc9465b44255c4e1c5ba

README.md

Arrow

This is a pure Julia implementation of the Apache Arrow data standard. This package provides Julia AbstractVector objects for referencing data that conforms to the Arrow standard. This allows users to seamlessly interface Arrow formatted data with a great deal of existing Julia code.

Please see this document for a description of the Arrow memory layout.

Installation

The package can be installed by typing in the following in a Julia REPL:

julia> using Pkg; Pkg.add("Arrow")

or to use the official-apache code that follows the official apache release process, you can do:

julia> using Pkg; Pkg.add(url="https://github.com/apache/arrow", subdir="julia/Arrow.jl")

Local Development

When developing on Arrow.jl it is recommended that you run the following to ensure that any changes to ArrowTypes.jl are immediately available to Arrow.jl without requiring a release:

julia --project -e 'using Pkg; Pkg.develop(path="src/ArrowTypes")'

Format Support

This implementation supports the 1.0 version of the specification, including support for:

All primitive data types
All nested data types
Dictionary encodings and messages
Extension types
Streaming, file, record batch, and replacement and isdelta dictionary messages

It currently doesn't include support for:

Tensors or sparse tensors
Flight RPC
C data interface

Third-party data formats:

CSV, parquet and avro support via the existing CSV.jl, Parquet.jl and Avro.jl packages
Other Tables.jl-compatible packages automatically supported (DataFrames.jl, JSONTables.jl, JuliaDB.jl, SQLite.jl, MySQL.jl, JDBC.jl, ODBC.jl, XLSX.jl, etc.)
No current Julia packages support ORC

See the full documentation for details on reading and writing arrow data.