| commit | 62266b5919f6bc5b543f9cfed64d7c1bf4818558 | [log] [tgz] | 
|---|---|---|
| author | Jacob Quinn <quinn.jacobd@gmail.com> | Sat Jan 30 23:18:44 2021 -0700 | 
| committer | GitHub <noreply@github.com> | Sat Jan 30 23:18:44 2021 -0700 | 
| tree | e459e430527fffbb38fd3aa2d7cb7a29c5c63a4d | |
| parent | da73d8201a8915325800dfda9a099539be096e7d [diff] | 
Rework dict encoding of PooledArray/CategoricalArray to fix outstandi… (#119) * Rework dict encoding of PooledArray/CategoricalArray to fix outstanding issues Fixes #117, #116, and #113. For #116, we just need to special case if user happens to pass in a DictEncoded themselves. We need to pass it through to the `toarrowvector` method that no-ops. For #113, we require the new functionality in PooledArrays that allows passing the `signed` and `compress` keyword arguments to ensure we get signed refs for our dict encoding. For #117, we add CategoricalArrays as a test dependency and ensure that if it contains any `missing` value, we *don't* recode the indices values down by 1, since the `missing` ref is 0, so other refs can already be considered "offsets". If there are no `missing`, then we still need to recode down since refs should always start from 0 in arrow format. * PooledArrays 1.0 compat * Update src/arraytypes/dictencoding.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Check refpool * Fix test Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
This is a pure Julia implementation of the Apache Arrow data standard.  This package provides Julia AbstractVector objects for referencing data that conforms to the Arrow standard.  This allows users to seamlessly interface Arrow formatted data with a great deal of existing Julia code.
Please see this document for a description of the Arrow memory layout.
The package can be installed by typing in the following in a Julia REPL:
julia> using Pkg; Pkg.add("Arrow")
or to use the official-apache code that follows the official apache release process, you can do:
julia> using Pkg; Pkg.add(url="https://github.com/apache/arrow", subdir="julia/Arrow.jl")
The code in the apache/arrow repository is officially part of the apache/arrow project and as such follows the regulated release cadence of the entire project, following standard community voting protocols. The JuliaData/Arrow.jl repository can be viewed as a sort of “dev” or “latest” branch of this code that may release more frequently, but without following official apache release guidelines. The two repositories are synced, however, so any bugfix patches in JuliaData will be upstreamed to apache/arrow for each release.
This implementation supports the 1.0 version of the specification, including support for:
It currently doesn't include support for:
Third-party data formats:
See the full documentation for details on reading and writing arrow data.