layout: default title: Apache Arrow 0.7.0 Release permalink: /release/0.7.0.html

Apache Arrow 0.7.0 (17 September 2017)

This is a major release. Read more in the release blog post.

Download

Contributors

$ git shortlog -sn apache-arrow-0.6.0..apache-arrow-0.7.0
    58  Wes McKinney
    14  Kouhei Sutou
    11  Philipp Moritz
     7  Phillip Cloud
     6  siddharth
     5  Uwe L. Korn
     2  Bryan Cutler
     2  HorimotoYasuhiro
     2  Laurent Goujon
     2  Li Jin
     2  Max Risuhin
     2  fjetter
     1  Antony Mayi
     1  Brecht Machiels
     1  Fritz Obermeyer
     1  Gonzalo Ortiz
     1  Jeff Reback
     1  Kyle Kelley
     1  Paul Taylor
     1  Robert Nishihara
     1  Wataru Shimizu
     1  hatappi
     1  mnc
     1  rvernica
     1  tetsutaroendo

Changelog

New Features and Improvements

  • ARROW-1156 - [Python] pyarrow.Array.from_pandas should take a type parameter
  • ARROW-1238 - [Java] Add JSON read/write support for decimals for integration tests
  • ARROW-1307 - [Python] Add pandas serialization section + Feather API to Sphinx docs
  • ARROW-1317 - [Python] Add function to set Hadoop CLASSPATH
  • ARROW-1331 - [Java] Refactor tests
  • ARROW-1339 - [C++] Use boost::filesystem for handling of platform-specific file path encodings
  • ARROW-1344 - [C++] Calling BufferOutputStream::Write after calling Finish crashes
  • ARROW-1348 - [C++/Python] Add release verification script for Windows
  • ARROW-1351 - Automate updating CHANGELOG.md as part of release scripts
  • ARROW-1352 - [Integration] Improve print formatting for producer, consumer line
  • ARROW-1355 - Make arrow buildable with java9
  • ARROW-1356 - [Website] Add new committers
  • ARROW-1358 - Update source release scripts to account for new SHA checksum policy
  • ARROW-1359 - [Python] Add Parquet writer option to normalize field names for use in Spark
  • ARROW-1364 - [C++] IPC reader and writer specialized for GPU device memory
  • ARROW-1366 - [Python] Add instructions for starting the Plasma store when installing pyarrow from wheels
  • ARROW-1372 - [Plasma] Support for storing data in huge pages
  • ARROW-1376 - [C++] RecordBatchStreamReader::Open API is inconsistent with writer
  • ARROW-1377 - [Python] Add function to assist with benchmarking Parquet scan performance
  • ARROW-1381 - [Python] Improve performance of SerializedPyObject.to_buffer
  • ARROW-1383 - [C++] Support std::vector in builder vector appends
  • ARROW-1384 - [C++] Add convenience function for serializing a record batch to an IPC message
  • ARROW-1386 - [C++] Unpin CMake version in MSVC build toolchain
  • ARROW-1387 - [C++] Set up GPU leaf library build toolchain
  • ARROW-1390 - [Python] Extend tests for python serialization
  • ARROW-1392 - [C++] Implement reader and writer IO interfaces for GPU buffers
  • ARROW-1395 - [C++] Remove APIs deprecated as of 0.5.0 and later versions
  • ARROW-1396 - [C++] Add PrettyPrint function for Schemas, which also outputs any dictionaries
  • ARROW-1397 - [Packaging] Use Docker instead of Vagrant
  • ARROW-1399 - [C++] Add CUDA build version in a public header to help prevent ABI conflicts
  • ARROW-1400 - [Python] Ability to create partitions when writing to Parquet
  • ARROW-1401 - [C++] Add extra debugging context to failures in RETURN_NOT_OK in debug builds
  • ARROW-1402 - [C++] Possibly deprecate public APIs that use MutableBuffer
  • ARROW-1404 - [Packaging] Build .deb and .rpm on Travis CI
  • ARROW-1405 - [Python] Add logging option for verbose memory allocations
  • ARROW-1406 - [Python] Harden user API for generating serialized schema and record batch messages as memoryview-compatible objects
  • ARROW-1408 - [C++] Refactor and make IPC read / write APIs more consistent, add appropriate deprecations
  • ARROW-1410 - Plasma object store occasionally pauses for a long time
  • ARROW-1412 - [Plasma] Add higher level API for putting and getting Python objects
  • ARROW-1413 - [C++] Add include-what-you-use configuration
  • ARROW-1415 - [GLib] Support date32 and date64
  • ARROW-1416 - [Format] Clarify example array in memory layout documentation
  • ARROW-1417 - [Python] Allow more generic filesystem objects to be passed to ParquetDataset
  • ARROW-1418 - [Python] Introduce SerializationContext to register custom serialization callbacks
  • ARROW-1419 - [GLib] Suppress sign-conversion warning on Clang
  • ARROW-1427 - [GLib] Add a link to readme of Arrow GLib
  • ARROW-1428 - [C++] Append steps to clone source code to README.mb
  • ARROW-1432 - [C++] Build bundled jemalloc functions with private prefix
  • ARROW-1433 - [C++] Simplify implementation of Array::Slice
  • ARROW-1438 - [Plasma] Pull SerializationContext through PlasmaClient put and get
  • ARROW-1441 - [Site] Add Ruby to Flexible section
  • ARROW-1442 - [Website] Add pointer to nightly conda packages on /install
  • ARROW-1447 - [C++] Round of include-what-you-use include cleanups
  • ARROW-1448 - [Packaging] Support uploading built .deb and .rpm to Bintray
  • ARROW-1449 - Implement Decimal using only Int128
  • ARROW-1451 - [C++] Create arrow/io/api.h
  • ARROW-1460 - [C++] Upgrade clang-format used to LLVM 4.0
  • ARROW-1462 - [GLib] Support time array
  • ARROW-1466 - [C++] Support DecimalArray in arrow::PrettyPrint
  • ARROW-1468 - [C++] Append to PrimitiveBuilder from std::vector
  • ARROW-1479 - [JS] Expand JavaScript implementation
  • ARROW-1480 - [Python] Improve performance of serializing sets
  • ARROW-1481 - [C++] Expose type casts as generic callable object that can write into pre-allocated memory
  • ARROW-1494 - [C++] Document that shared_ptr returned by RecordBatch::column needs to be retained
  • ARROW-1499 - [Python] Consider adding option to parquet.write_table that sets options for maximum Spark compatibility
  • ARROW-1504 - [GLib] Support timestamp
  • ARROW-1505 - [GLib] Simplify arguments check
  • ARROW-1506 - [C++] Support pkg-config for compute modules
  • ARROW-1508 - C++: Add support for FixedSizeBinaryType in DictionaryBuilder
  • ARROW-1510 - [C++] Support cast
  • ARROW-1511 - [C++] Deprecate arrow::MakePrimitiveArray
  • ARROW-1513 - C++: Add cast from Dictionary to plain arrays
  • ARROW-1515 - [GLib] Detect version directly
  • ARROW-1516 - [GLib] Update document
  • ARROW-1517 - Remove unnecessary temporary in DecimalUtil::ToString function
  • ARROW-1519 - [C++] Move DecimalUtil functions to methods on the Int128 class
  • ARROW-1528 - [GLib] Resolve include dependency
  • ARROW-1530 - [C++] Install arrow/util/parallel.h
  • ARROW-1551 - [Website] Updates for 0.7.0 release
  • ARROW-229 - [C++] Implement safe casts for primitive types
  • ARROW-592 - [C++] Provide .deb and .rpm packages
  • ARROW-594 - [Python] Provide interface to write pyarrow.Table to a stream
  • ARROW-695 - Integration tests for Decimal types
  • ARROW-696 - [C++] Add JSON read/write support for decimals for integration tests
  • ARROW-759 - [Python] Implement a transient list serialization function that can handle a mix of scalars, lists, ndarrays, dicts
  • ARROW-786 - [Format] In-memory format for 128-bit Decimals, handling of sign bit
  • ARROW-837 - [Python] Expose buffer allocation, FixedSizeBufferWriter
  • ARROW-941 - [Docs] Improve “cold start” integration testing instructions
  • ARROW-989 - [Python] Write pyarrow.Table to FileWriter or StreamWriter

Bug Fixes

  • ARROW-12 - Get Github activity mirrored to JIRA
  • ARROW-1302 - C++: ${MAKE} variable not set sometimes on older MacOS installations
  • ARROW-1354 - [Python] Segfault in Table.from_pandas with Mixed-Type Categories
  • ARROW-1357 - [Python] Data corruption in reading multi-file parquet dataset
  • ARROW-1363 - [C++] IPC writer sends buffer layout for dictionary rather than indices
  • ARROW-1365 - [Python] Remove usage of removed jemalloc_memory_pool in Python API docs
  • ARROW-1373 - [Java] Implement getBuffer() methods at the ValueVector interface
  • ARROW-1375 - [C++] Visual Studio 2017 Appveyor builds failing
  • ARROW-1379 - [Java] maven dependency issues - both unused and undeclared
  • ARROW-1407 - Dictionaries can only hold a maximum of 4096 indices
  • ARROW-1411 - [Python] Booleans in Float Columns cause Segfault
  • ARROW-1414 - [GLib] Cast after status check
  • ARROW-1421 - [Python] pyarrow.serialize cannot serialize a Python dict input
  • ARROW-1426 - [Website] The title element of the top page is empty
  • ARROW-1429 - [Python] Error loading parquet file with _metadata from HDFS
  • ARROW-1430 - [Python] flake8 warnings are not failing CI builds
  • ARROW-1434 - [C++/Python] pyarrow.Array.from_pandas does not support datetime64[D] arrays
  • ARROW-1435 - [Python] PyArrow not propagating timezone information from Parquet to Python
  • ARROW-1439 - [Packaging] Automate updating RPM in RPM build
  • ARROW-1443 - [Java] Bug on ArrowBuf.setBytes with unsliced ByteBuffers
  • ARROW-1444 - [JAVA] BitVector.splitAndTransfer copies last byte incorrectly
  • ARROW-1446 - Python: Writing more than 2^31 rows from pandas dataframe causes row count overflow error
  • ARROW-1450 - [Python] Raise proper error if custom serialization handler fails
  • ARROW-1452 - [C++] Make UNUSED macro name more unique so it does not conflict with thirdparty projects
  • ARROW-1453 - [Python] Implement WriteTensor for non-contiguous tensors
  • ARROW-1457 - [C++] Optimize strided WriteTensor
  • ARROW-1458 - [Python] Document that HadoopFileSystem.mkdir with create_parents=False has no effect
  • ARROW-1459 - [Python] PyArrow fails to load partitioned parquet files with non-primitive types
  • ARROW-1461 - [C++] Disable builds using LLVM apt packages temporarily
  • ARROW-1467 - [JAVA]: Fix reset() and allocateNew() in Nullable Value Vectors template
  • ARROW-1490 - [Java] Allow Travis CI failures for JDK9 for now
  • ARROW-1493 - [C++] Flush the output stream at the end of each PrettyPrint function
  • ARROW-1495 - [C++] Store shared_ptr to boxed arrays in RecordBatch
  • ARROW-1507 - [C++] arrow/compute/api.h can't be used without arrow/array.h
  • ARROW-1512 - [Docs] NumericArray has no member named ‘raw_data’
  • ARROW-1514 - [C++] Fix a typo in document
  • ARROW-1527 - Fix Travis JDK9 build
  • ARROW-1531 - [C++] Return ToBytes by value from Decimal128
  • ARROW-1532 - [Python] Referencing an Empty Schema causes a SegFault
  • ARROW-407 - BitVector.copyFromSafe() should re-allocate if necessary instead of returning false
  • ARROW-801 - [JAVA] Provide direct access to underlying buffer memory addresses in consistent way without generating garbage or large amount indirections