layout: default title: Apache Arrow 0.6.0 Release permalink: /release/0.6.0.html

Apache Arrow 0.6.0 (14 August 2017)

This is a major release. Read more in the release blog post.

Download

Contributors

$ git shortlog -sn apache-arrow-0.5.0..apache-arrow-0.6.0
    48  Wes McKinney
     7  siddharth
     5  Matt Darwin
     5  Max Risuhin
     5  Philipp Moritz
     4  Kouhei Sutou
     3  Bryan Cutler
     2  Emilio Lahr-Vivaz
     2  Li Jin
     2  Robert Nishihara
     1  Antony Mayi
     1  Marco Neumann
     1  Stepan Kadlec
     1  Steven Phillips
     1  Yeolar
     1  fjetter
     1  rendel

Changelog

New Features and Improvements

  • ARROW-1076 - [Python] Handle nanosecond timestamps more gracefully when writing to Parquet format
  • ARROW-1093 - [Python] Fail Python builds if flake8 yields warnings
  • ARROW-1104 - Integrate in-memory object store from Ray
  • ARROW-1121 - [C++] Improve error message when opening OS file fails
  • ARROW-1140 - [C++] Allow optional build of plasma
  • ARROW-1149 - [Plasma] Create Cython client library for Plasma
  • ARROW-1173 - [Plasma] Blog post for Plasma
  • ARROW-1211 - [C++] Consider making default_memory_pool() the default for builder classes
  • ARROW-1213 - [Python] Enable s3fs to be used with ParquetDataset and reader/writer functions
  • ARROW-1219 - [C++] Use more vanilla Google C++ formatting
  • ARROW-1224 - [Format] Clarify language around buffer padding and alignment in IPC
  • ARROW-1230 - [Plasma] Install libraries and headers
  • ARROW-1241 - [C++] Visual Studio 2017 Appveyor build job
  • ARROW-1243 - [Java] security: upgrade all libraries to latest stable versions
  • ARROW-1246 - [Format] Add Map logical type to metadata
  • ARROW-1251 - [Python/C++] Revise build documentation to account for latest build toolchain
  • ARROW-1253 - [C++] Use pre-built toolchain libraries where prudent to speed up CI builds
  • ARROW-1255 - [Plasma] Check plasma flatbuffer messages with the flatbuffer verifier
  • ARROW-1257 - [Plasma] Plasma documentation
  • ARROW-1258 - [C++] Suppress dlmalloc warnings on Clang
  • ARROW-1259 - [Plasma] Speed up Plasma tests
  • ARROW-1260 - [Plasma] Use factory method to create Python PlasmaClient
  • ARROW-1264 - [Plasma] Don‘t exit the Python interpreter if the plasma client can’t connect to the store
  • ARROW-1268 - [Website] Blog post on Arrow integration with Spark
  • ARROW-1270 - [Packaging] Add Python wheel build scripts for macOS to arrow-dist
  • ARROW-1272 - [Python] Add script to arrow-dist to generate and upload manylinux1 Python wheels
  • ARROW-1273 - [Python] Add convenience functions for reading only Parquet metadata or effective Arrow schema from a particular Parquet file
  • ARROW-1274 - [C++] add_compiler_export_flags() throws warning with CMake >= 3.3
  • ARROW-1281 - [C++/Python] Add Docker setup for running HDFS tests and other tests we may not run in Travis CI
  • ARROW-1288 - Clean up many ASF license headers
  • ARROW-1289 - [Python] Add PYARROW_BUILD_PLASMA option like Parquet
  • ARROW-1297 - 0.6.0 Release
  • ARROW-1301 - [C++/Python] Add remaining supported libhdfs UNIX-like filesystem APIs
  • ARROW-1303 - [C++] Support downloading Boost
  • ARROW-1304 - [Java] Fix checkstyle checks warning
  • ARROW-1305 - [GLib] Add GArrowIntArrayBuilder
  • ARROW-1315 - [GLib] Status check of arrow::ArrayBuilder::Finish() is missing
  • ARROW-1323 - [GLib] Add garrow_boolean_array_get_values()
  • ARROW-1333 - [Plasma] Sorting example for DataFrames in plasma
  • ARROW-1334 - [C++] Instantiate arrow::Table from vector of Array objects (instead of Columns)
  • ARROW-1336 - [C++] Add arrow::schema factory function
  • ARROW-439 - [Python] Add option in “to_pandas” conversions to yield Categorical from String/Binary arrays
  • ARROW-622 - [Python] Investigate alternatives to timestamps_to_ms argument in pandas conversion

Bug Fixes

  • ARROW-1192 - [JAVA] Improve splitAndTransfer performance for List and Union vectors
  • ARROW-1195 - [C++] CpuInfo doesn't get cache size on Windows
  • ARROW-1204 - [C++] lz4 ExternalProject fails in Visual Studio 2015
  • ARROW-1225 - [Python] pyarrow.array does not attempt to convert bytes to UTF8 when passed a StringType
  • ARROW-1237 - [JAVA] Expose the ability to set lastSet
  • ARROW-1239 - issue with current version of git-commit-id-plugin
  • ARROW-1240 - security: upgrade logback to address CVE-2017-5929
  • ARROW-1242 - [Java] security - upgrade Jackson to mitigate 3 CVE vulnerabilities
  • ARROW-1245 - [Integration] Java Integration Tests Disabled
  • ARROW-1248 - [Python] C linkage warnings in Clang with public Cython API
  • ARROW-1249 - [JAVA] Expose the fillEmpties function from NullableVector.mutator
  • ARROW-1263 - [C++] CpuInfo should be able to get CPU features on Windows
  • ARROW-1265 - [Plasma] Plasma store memory leak warnings in Python test suite
  • ARROW-1267 - [Java] Handle zero length case in BitVector.splitAndTransfer
  • ARROW-1269 - [Packaging] Add Windows wheel build scripts from ARROW-1068 to arrow-dist
  • ARROW-1275 - [C++] Default static library prefix for Snappy should be “_static”
  • ARROW-1276 - Cannot serializer empty DataFrame to parquet
  • ARROW-1283 - [Java] VectorSchemaRoot should be able to be closed() more than once
  • ARROW-1285 - PYTHON: NotImplemented exception creates empty parquet file
  • ARROW-1287 - [Python] Emulate “whence” argument of seek in NativeFile
  • ARROW-1290 - [C++] Use array capacity doubling in arrow::BufferBuilder
  • ARROW-1291 - [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
  • ARROW-1294 - [C++] New Appveyor build failures
  • ARROW-1296 - [Java] templates/FixValueVectors reset() method doesn't set allocationSizeInBytes correctly
  • ARROW-1300 - [JAVA] Fix ListVector Tests
  • ARROW-1306 - [Python] Encoding? issue with error reporting for parquet.read_table
  • ARROW-1308 - [C++] ld tries to link ‘arrow_static’ even when -DARROW_BUILD_STATIC=off
  • ARROW-1309 - [Python] Error inferring List type in Array.from_pandas when inner values are all None
  • ARROW-1310 - [JAVA] Revert ARROW-886
  • ARROW-1312 - [C++] Set default value to ARROW_JEMALLOC to OFF until ARROW-1282 is resolved
  • ARROW-1326 - [Python] Fix Sphinx build in Travis CI
  • ARROW-1327 - [Python] Failing to release GIL in MemoryMappedFile._open causes deadlock
  • ARROW-1328 - [Python] pyarrow.Table.from_pandas option timestamps_to_ms changes column values
  • ARROW-1330 - [Plasma] Turn on plasma tests on manylinux1
  • ARROW-1335 - [C++] PrimitiveArray::raw_values has inconsistent semantics re: offsets compared with subclasses
  • ARROW-1338 - [Python] Investigate non-deterministic core dump on Python 2.7, Travis CI builds
  • ARROW-1340 - [Java] NullableMapVector field doesn't maintain metadata
  • ARROW-1342 - [Python] Support strided array of lists
  • ARROW-1343 - [Format/Java/C++] Ensuring encapsulated stream / IPC message sizes are always a multiple of 8
  • ARROW-1350 - [C++] Include Plasma source tree in source distribution
  • ARROW-187 - [C++] Decide on how pedantic we want to be about exceptions
  • ARROW-276 - [JAVA] Nullable Value Vectors should extend BaseValueVector instead of BaseDataValueVector
  • ARROW-573 - [Python/C++] Support ordered dictionaries data, pandas Categorical
  • ARROW-884 - [C++] Exclude internal classes from documentation
  • ARROW-932 - [Python] Fix compiler warnings on MSVC
  • ARROW-968 - [Python] RecordBatch [i:j] syntax is incomplete