layout: default title: Apache Arrow 0.5.0 Release permalink: /release/0.5.0.html

Apache Arrow 0.5.0 (23 July 2017)

This is a major release, with expanded features in the supported languages and additional integration test coverage between Java and C++.

Read more in the release blog post.

Download

Contributors

$ git shortlog -sn apache-arrow-0.4.1..apache-arrow-0.5.0
    42  Wes McKinney
    22  Uwe L. Korn
    12  Kouhei Sutou
     9  Max Risuhin
     9  Phillip Cloud
     6  Philipp Moritz
     5  Steven Phillips
     3  Julien Le Dem
     2  Bryan Cutler
     2  Kengo Seki
     2  Max Risukhin
     2  fjetter
     1  Antony Mayi
     1  Deepak Majeti
     1  Fang Zheng
     1  Hideo Hattori
     1  Holden Karau
     1  Itai Incze
     1  Jeff Knupp
     1  LynnYuan
     1  Mark Lavrynenko
     1  Michael Kรถnig
     1  Robert Nishihara
     1  Sudheesh Katkam
     1  Zahari
     1  vkorukanti

Changelog

New Features and Improvements

  • ARROW-1041 - [Python] Support read_pandas on a directory of Parquet files
  • ARROW-1048 - Allow user LD_LIBRARY_PATH to be used with source release script
  • ARROW-1052 - Arrow 0.5.0 release
  • ARROW-1073 - C++: Adapative integer builder
  • ARROW-1095 - [Website] Add Arrow icon asset
  • ARROW-1100 - [Python] Add “mode” property to NativeFile instances
  • ARROW-1102 - Make MessageSerializer.serializeMessage() public
  • ARROW-111 - [C++] Add static analyzer to tool chain to verify checking of Status returns
  • ARROW-1120 - [Python] Write support for int96
  • ARROW-1122 - [Website] Guest blog post on Arrow + ODBC from turbodbc
  • ARROW-1123 - C++: Make jemalloc the default allocator
  • ARROW-1135 - Upgrade Travis CI clang builds to use LLVM 4.0
  • ARROW-1137 - Python: Ensure Pandas roundtrip of all-None column
  • ARROW-1142 - [C++] Move over compression library toolchain from parquet-cpp
  • ARROW-1145 - [GLib] Add get_values()
  • ARROW-1146 - Add .gitignore for *_generated.h files in src/plasma/format
  • ARROW-1148 - [C++] Raise minimum CMake version to 3.2
  • ARROW-1151 - [C++] Add gcc branch prediction to status check macro
  • ARROW-1154 - [C++] Migrate more computational utility code from parquet-cpp
  • ARROW-1160 - C++: Implement DictionaryBuilder
  • ARROW-1165 - [C++] Refactor PythonDecimalToArrowDecimal to not use templates
  • ARROW-1172 - [C++] Use unique_ptr with array builder classes
  • ARROW-1183 - [Python] Implement time type conversions in to_pandas
  • ARROW-1185 - [C++] Clean up arrow::Status implementation, add warn_unused_result attribute for clang
  • ARROW-1187 - Serialize a DataFrame with None column
  • ARROW-1193 - [C++] Support pkg-config forarrow_python.so
  • ARROW-1196 - [C++] Appveyor separate jobs for Debug/Release builds from sources; Build with conda toolchain; Build with NMake Makefiles Generator
  • ARROW-1198 - Python: Add public C++ API to unwrap PyArrow object
  • ARROW-1199 - [C++] Introduce mutable POD struct for generic array data
  • ARROW-1202 - Remove semicolons from status macros
  • ARROW-1212 - [GLib] Add garrow_binary_array_get_offsets_buffer()
  • ARROW-1214 - [Python] Add classes / functions to enable stream message components to be handled outside of the stream reader class
  • ARROW-1217 - [GLib] Add GInputStream based arrow::io::RandomAccessFile
  • ARROW-1220 - [C++] Standartize usage of *_HOME cmake script variables for 3rd party libs
  • ARROW-1221 - [C++] Pin clang-format version
  • ARROW-1227 - [GLib] Support GOutputStream
  • ARROW-1228 - [GLib] Test file name should be the same name as target class
  • ARROW-1229 - [GLib] Follow Reader API change (get -> read)
  • ARROW-1233 - [C++] Validate cmake script resolving of 3rd party linked libs from correct location in toolchain build
  • ARROW-460 - [C++] Implement JSON round trip for DictionaryArray
  • ARROW-462 - [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent
  • ARROW-575 - Python: Auto-detect nested lists and nested numpy arrays in Pandas
  • ARROW-597 - [Python] Add convenience function to yield DataFrame from any object that a StreamReader or FileReader can read from
  • ARROW-599 - [C++] Add LZ4 codec to 3rd-party toolchain
  • ARROW-600 - [C++] Add ZSTD codec to 3rd-party toolchain
  • ARROW-692 - Java<->C++ Integration tests for dictionary-encoded vectors
  • ARROW-693 - [Java] Add JSON support for dictionary vectors
  • ARROW-742 - Handling exceptions during execution of std::wstring_convert
  • ARROW-834 - [Python] Support creating Arrow arrays from Python iterables
  • ARROW-915 - Struct Array reads limited support
  • ARROW-935 - [Java] Build Javadoc in Travis CI
  • ARROW-960 - [Python] Add source build guide for macOS + Homebrew
  • ARROW-962 - [Python] Add schema attribute to FileReader
  • ARROW-966 - [Python] pyarrow.list_ should also accept Field instance
  • ARROW-978 - [Python] Use sphinx-bootstrap-theme for Sphinx documentation

Bug Fixes

  • ARROW-1074 - from_pandas doesnt convert ndarray to list
  • ARROW-1079 - [Python] Empty “private” directories should be ignored by Parquet interface
  • ARROW-1081 - C++: arrow::test::TestBase::MakePrimitive doesn't fill null_bitmap
  • ARROW-1096 - [C++] Memory mapping file over 4GB fails on Windows
  • ARROW-1097 - Reading tensor needs file to be opened in writeable mode
  • ARROW-1098 - Document Error?
  • ARROW-1101 - UnionListWriter is not implementing all methods on interface ScalarWriter
  • ARROW-1103 - [Python] Utilize pandas metadata from common _metadata Parquet file if it exists
  • ARROW-1107 - [JAVA] NullableMapVector getField() should return nullable type
  • ARROW-1108 - Check if ArrowBuf is empty buffer in getActualConsumedMemory() and getPossibleConsumedMemory()
  • ARROW-1109 - [JAVA] transferOwnership fails when readerIndex is not 0
  • ARROW-1110 - [JAVA] make union vector naming consistent
  • ARROW-1111 - [JAVA] Make aligning buffers optional, and allow -1 for unknown null count
  • ARROW-1112 - [JAVA] Set lastSet for VarLength and List vectors when loading
  • ARROW-1113 - [C++] gflags EP build gets triggered (as a no-op) on subsequent calls to make or ninja build
  • ARROW-1115 - [C++] Use absolute path for ccache
  • ARROW-1117 - [Docs] Minor issues in GLib README
  • ARROW-1124 - [Python] pyarrow needs to depend on numpy>=1.10 (not 1.9)
  • ARROW-1125 - Python: Table.from_pandas doesn't work anymore on partial schemas
  • ARROW-1128 - [Docs] command to build a wheel is not properly rendered
  • ARROW-1129 - [C++] Fix Linux toolchain build regression from ARROW-742
  • ARROW-1131 - Python: Parquet unit tests are always skipped
  • ARROW-1132 - [Python] Unable to write pandas DataFrame w/MultiIndex containing duplicate values to parquet
  • ARROW-1136 - [C++/Python] Segfault on empty stream
  • ARROW-1138 - Travis: Use OpenJDK7 instead of OracleJDK7
  • ARROW-1139 - [C++] dlmalloc doesn't allow arrow to be built with clang 4 or gcc 7.1.1
  • ARROW-1141 - on import get libjemalloc.so.2: cannot allocate memory in static TLS block
  • ARROW-1143 - C++: Fix comparison of NullArray
  • ARROW-1144 - [C++] Remove unused variable
  • ARROW-1147 - [C++] Allow optional vendoring of flatbuffers in plasma
  • ARROW-1150 - [C++] AdaptiveIntBuilder compiler warning on MSVC
  • ARROW-1152 - [Cython] read_tensor should work with a readable file
  • ARROW-1155 - segmentation fault when run pa.Int16Value()
  • ARROW-1157 - C++/Python: Decimal templates are not correctly exported on OSX
  • ARROW-1159 - [C++] Static data members cannot be accessed from inline functions in Arrow headers by thirdparty users
  • ARROW-1162 - Transfer Between Empty Lists Should Not Invoke Callback
  • ARROW-1166 - Errors in Struct type's example and missing reference in Layout.md
  • ARROW-1167 - [Python] Create chunked BinaryArray in Table.from_pandas when a column's data exceeds 2GB
  • ARROW-1168 - [Python] pandas metadata may contain “mixed” data types
  • ARROW-1169 - C++: jemalloc externalproject doesn‘t build with CMake’s ninja generator
  • ARROW-1170 - C++: ARROW_JEMALLOC=OFF breaks linking on unittest
  • ARROW-1174 - [GLib] Investigate root cause of ListArray glib test failure
  • ARROW-1177 - [C++] Detect int32 overflow in ListBuilder::Append
  • ARROW-1179 - C++: Add missing virtual destructors
  • ARROW-1180 - [GLib] garrow_tensor_get_dimension_name() returns invalid address
  • ARROW-1181 - [Python] Parquet test fail if not enabled
  • ARROW-1182 - C++: Specify BUILD_BYPRODUCTS for zlib and zstd
  • ARROW-1186 - [C++] Enable option to build arrow with minimal dependencies needed to build Parquet library
  • ARROW-1188 - Segfault when trying to serialize a DataFrame with Null-only Categorical Column
  • ARROW-1190 - VectorLoader corrupts vectors with duplicate names
  • ARROW-1191 - [JAVA] Implement getField() method for the complex readers
  • ARROW-1194 - Getting record batch size with pa.get_record_batch_size returns a size that is too small for pandas DataFrame.
  • ARROW-1197 - [GLib] record_batch.hpp Inclusion is missing
  • ARROW-1200 - [C++] DictionaryBuilder should use signed integers for indices
  • ARROW-1201 - [Python] Incomplete Python types cause a core dump when repr-ing
  • ARROW-1203 - [C++] Disallow BinaryBuilder to append byte strings larger than the maximum value of int32_t
  • ARROW-1205 - C++: Reference to type objects in ArrayLoader may cause segmentation faults.
  • ARROW-1206 - [C++] Enable MSVC builds to work with some compression library support disabled
  • ARROW-1208 - [C++] Toolchain build with ZSTD library from conda-forge failure
  • ARROW-1215 - [Python] Class methods in API reference
  • ARROW-1216 - Numpy arrays cannot be created from Arrow Buffers on Python 2
  • ARROW-1218 - Arrow doesn't compile if all compression libraries are deactivated
  • ARROW-1222 - [Python] pyarrow.array returns NullArray for array of unsupported Python objects
  • ARROW-1223 - [GLib] Fix function name that returns wrapped object
  • ARROW-1235 - [C++] macOS linker failure with operator<< and std::ostream
  • ARROW-1236 - Library paths in exported pkg-config file are incorrect
  • ARROW-601 - Some logical types not supported when loading Parquet
  • ARROW-784 - Cleaning up thirdparty toolchain support in Arrow on Windows
  • ARROW-992 - [Python] In place development builds do not have a version