tag	d46a95078d069c1e676dfdd72d18e8be4511957a
tagger	github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	Mon Mar 29 13:37:39 2021 +0000
object	b610770a75dcd03849fb3bcd4254df17e9a76153

tag

d46a95078d069c1e676dfdd72d18e8be4511957a

tagger

github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Mon Mar 29 13:37:39 2021 +0000

object

b610770a75dcd03849fb3bcd4254df17e9a76153

## Arrow v1.3.0 [Diff since v1.2.4](https://github.com/JuliaData/Arrow.jl/compare/v1.2.4...v1.3.0) **Closed issues:** - Attempting to serialize `DataType`s induces segfault (#74) - tables containing `Set` values are serializable but corresponding deserialized `Arrow.Table`s are inaccessible (#75) - support for heterogeneously typed tuples (#85) - Difficult to read the code (#91) - Arrow.write hangs on Tables.partitioner (#108) - Unsafe conversion to signed integer types (#121) - Arrow.write in v1.4.2 can create an invalid arrow file (#126) - Arrow dataset imported as DataFrames are not pure DataFrames? (#127) - Arrow.jl issue with struct types (#128) - `unsupported ARROW:extension:name type: "JuliaLang.Nothing"` (#132) - Loss of parametric type information for custom types (#134) - Avoid assuming field values can be used in constructors (#135) - Help (#137) - Losing type in unnamed column (#138) - How to handle parametric Unitful types (#139) - Can't serialize structs that contain `::Type{T}` fields (#140) - Cannot iterate Arrow.Stream (#141) - Arrow.write("my.arrow", CategoricalArray([1,2,3])) hangs (#143) - Arrow Table conversion to DataFrame throws DimensionMismatch Error (#144) - copying Arrow.Table does not always copy columns (#146) - Hang with multithreaded reading (#155) **Merged pull requests:** - Add ntasks keyword to limit # of tasks allowed to write at a time (#106) (@quinnj) - Fix typo (#130) (@Sov-trotter) - implement Base.IteratorSize for Stream, fixes #141 (#142) (@damiendr) - Introduce new `maxdepth` keyword argument for setting a limit on nesting (#147) (@quinnj) - Ensure dict encoded index types match from record batch to record batch (#148) (@quinnj) - Ensure serializing Arrow.DictEncoded writes dictionary messages (#149) (@quinnj) - revert setting Arrow.write debug message threshold to -1 (#152) (@jrevels) - add unexported `tobuffer` utility for interactive testing/development (#153) (@jrevels) - Better handle errors when something goes wrong writing partitions (#154) (@quinnj) - Overhaul type serialization/deserialization machinery (#156) (@quinnj)

commit	b610770a75dcd03849fb3bcd4254df17e9a76153	[log] [tgz]
author	Jacob Quinn <quinn.jacobd@gmail.com>	Mon Mar 29 07:23:06 2021 -0600
committer	GitHub <noreply@github.com>	Mon Mar 29 07:23:06 2021 -0600
tree	7e1ffc44d7debd1fe645473fc05bedb1817d324b
parent	ff53d1359c01ae3e98fa3723f9f994c9ba420050 [diff]

commit

b610770a75dcd03849fb3bcd4254df17e9a76153

[log] [tgz]

author

Jacob Quinn <quinn.jacobd@gmail.com>

Mon Mar 29 07:23:06 2021 -0600

committer

GitHub <noreply@github.com>

Mon Mar 29 07:23:06 2021 -0600

tree

7e1ffc44d7debd1fe645473fc05bedb1817d324b

parent

ff53d1359c01ae3e98fa3723f9f994c9ba420050 [diff]

tree: 7e1ffc44d7debd1fe645473fc05bedb1817d324b

README.md

Arrow

This is a pure Julia implementation of the Apache Arrow data standard. This package provides Julia AbstractVector objects for referencing data that conforms to the Arrow standard. This allows users to seamlessly interface Arrow formatted data with a great deal of existing Julia code.

Please see this document for a description of the Arrow memory layout.

Installation

The package can be installed by typing in the following in a Julia REPL:

julia> using Pkg; Pkg.add("Arrow")

or to use the official-apache code that follows the official apache release process, you can do:

julia> using Pkg; Pkg.add(url="https://github.com/apache/arrow", subdir="julia/Arrow.jl")

Difference between this code and the apache/arrow/julia/Arrow repository

The code in the apache/arrow repository is officially part of the apache/arrow project and as such follows the regulated release cadence of the entire project, following standard community voting protocols. The JuliaData/Arrow.jl repository can be viewed as a sort of “dev” or “latest” branch of this code that may release more frequently, but without following official apache release guidelines. The two repositories are synced, however, so any bugfix patches in JuliaData will be upstreamed to apache/arrow for each release.

Format Support

This implementation supports the 1.0 version of the specification, including support for:

All primitive data types
All nested data types
Dictionary encodings and messages
Extension types
Streaming, file, record batch, and replacement and isdelta dictionary messages

It currently doesn't include support for:

Tensors or sparse tensors
Flight RPC
C data interface

Third-party data formats:

csv and parquet support via the existing CSV.jl and Parquet.jl packages
Other Tables.jl-compatible packages automatically supported (DataFrames.jl, JSONTables.jl, JuliaDB.jl, SQLite.jl, MySQL.jl, JDBC.jl, ODBC.jl, XLSX.jl, etc.)
No current Julia packages support ORC or Avro data formats

See the full documentation for details on reading and writing arrow data.