blob: dc40f2d9a133e34f43cb2f247c60b673058b7c6f [file] [log] [blame]
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>0.2.0 Release | Apache Arrow</title>
<!-- Begin Jekyll SEO tag v2.8.0 -->
<meta name="generator" content="Jekyll v4.3.3" />
<meta property="og:title" content="0.2.0 Release" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="Apache Arrow 0.2.0 (18 February 2017) Download Source Artifacts Git tag Changelog Contributors $ git shortlog -sn apache-arrow-0.1.0..apache-arrow-0.2.0 73 Wes McKinney 55 Uwe L. Korn 16 Julien Le Dem 4 Bryan Cutler 4 Nong Li 2 Christopher C. Aycock 2 Jingyuan Wang 2 Kouhei Sutou 2 Laurent Goujon 2 Leif Walsh 1 Emilio Lahr-Vivaz 1 Holden Karau 1 Li Jin 1 Mohamed Zenadi 1 Peter Hoffmann 1 Steven Phillips 1 adeneche 1 ahnj 1 vkorukanti New Features and Improvements ARROW-108 - [C++] Add IPC round trip for union types ARROW-189 - C++: Use ExternalProject to build thirdparty dependencies ARROW-191 - Python: Provide infrastructure for manylinux1 wheels ARROW-221 - Add switch for writing Parquet 1.0 compatible logical types ARROW-227 - [C++/Python] Hook arrow_io generic reader / writer interface into arrow_parquet ARROW-228 - [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects ARROW-243 - [C++] Add “driver” option to HdfsClient to choose between libhdfs and libhdfs3 at runtime ARROW-268 - [C++] Flesh out union implementation to have all required methods for IPC ARROW-303 - [C++] Also build static libraries for leaf libraries ARROW-312 - [Python] Provide Python API to read/write the Arrow IPC file format ARROW-317 - [C++] Implement zero-copy Slice method on arrow::Buffer that retains reference to parent ARROW-327 - [Python] Remove conda builds from Travis CI processes ARROW-328 - [C++] Return shared_ptr by value instead of const-ref? ARROW-33 - C++: Implement zero-copy array slicing ARROW-330 - [C++] CMake functions to simplify shared / static library configuration ARROW-332 - [Python] Add helper function to convert RecordBatch to pandas.DataFrame ARROW-333 - Make writers update their internal schema even when no data is written. ARROW-335 - Improve Type apis and toString() by encapsulating flatbuffers better ARROW-336 - Run Apache Rat in Travis builds ARROW-338 - [C++] Refactor IPC vector “loading” and “unloading” to be based on cleaner visitor pattern ARROW-350 - Add Kerberos support to HDFS shim ARROW-353 - Arrow release 0.2 ARROW-355 - Add tests for serialising arrays of empty strings to Parquet ARROW-356 - Add documentation about reading Parquet ARROW-359 - Need to document ARROW_LIBHDFS_DIR ARROW-360 - C++: Add method to shrink PoolBuffer using realloc ARROW-361 - Python: Support reading a column-selection from Parquet files ARROW-363 - Set up Java/C++ integration test harness ARROW-365 - Python: Provide Array.to_pandas() ARROW-366 - [java] implement Dictionary vector ARROW-367 - [java] converter csv/json &lt;=&gt; Arrow file format for Integration tests ARROW-368 - Document use of LD_LIBRARY_PATH when using Python ARROW-369 - [Python] Add ability to convert multiple record batches at once to pandas ARROW-372 - Create JSON arrow file format for integration tests ARROW-373 - [C++] Implement C++ version of JSON file format for testing ARROW-374 - Python: clarify unicode vs. binary in API ARROW-377 - Python: Add support for conversion of Pandas.Categorical ARROW-379 - Python: Use setuptools_scm/setuptools_scm_git_archive to provide the version number ARROW-380 - [Java] optimize null count when serializing vectors. ARROW-381 - [C++] Simplify primitive array type builders to use a default type singleton ARROW-382 - Python: Extend API documentation ARROW-383 - [C++] Implement C++ version of ARROW-367 integration test validator ARROW-389 - Python: Write Parquet files to pyarrow.io.NativeFile objects ARROW-394 - Add integration tests for boolean, list, struct, and other basic types ARROW-396 - Python: Add pyarrow.schema.Schema.equals ARROW-409 - Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead ARROW-410 - [C++] Add Flush method to arrow::io::OutputStream ARROW-411 - [Java] Move Intergration.compare and Intergration.compareSchemas to a public utils class ARROW-415 - C++: Add Equals implementation to compare Tables ARROW-416 - C++: Add Equals implementation to compare Columns ARROW-417 - C++: Add Equals implementation to compare ChunkedArrays ARROW-418 - [C++] Consolidate array container and builder code, remove arrow/types ARROW-419 - [C++] Promote util/{status.h, buffer.h, memory-pool.h} to top level of arrow/ source directory ARROW-423 - C++: Define BUILD_BYPRODUCTS in external project to support non-make CMake generators ARROW-425 - Python: Expose a C function to convert arrow::Table to pyarrow.Table ARROW-426 - Python: Conversion from pyarrow.Array to a Python list ARROW-427 - [C++] Implement dictionary-encoded array container ARROW-428 - [Python] Deserialize from Arrow record batches to pandas in parallel using a thread pool ARROW-430 - Python: Better version handling ARROW-432 - [Python] Avoid unnecessary memory copy in to_pandas conversion by using low-level pandas internals APIs ARROW-438 - [Python] Concatenate Table instances with equal schemas ARROW-440 - [C++] Support pkg-config ARROW-441 - [Python] Expose Arrow’s file and memory map classes as NativeFile subclasses ARROW-442 - [Python] Add public Python API to inspect Parquet file metadata ARROW-444 - [Python] Avoid unnecessary memory copies from use of PyBytes_* C APIs ARROW-449 - Python: Conversion from pyarrow.{Table,RecordBatch} to a Python dict ARROW-450 - Python: Fixes for PARQUET-818 ARROW-456 - C++: Add jemalloc based MemoryPool ARROW-457 - Python: Better control over memory pool ARROW-458 - Python: Expose jemalloc MemoryPool ARROW-461 - [Python] Implement conversion between arrow::DictionaryArray and pandas.Categorical ARROW-463 - C++: Support jemalloc 4.x ARROW-466 - C++: ExternalProject for jemalloc ARROW-467 - [Python] Run parquet-cpp unit tests in Travis CI ARROW-468 - Python: Conversion of nested data in pd.DataFrames to/from Arrow structures ARROW-470 - [Python] Add “FileSystem” abstraction to access directories of files in a uniform way ARROW-471 - [Python] Enable ParquetFile to pass down separately-obtained file metadata ARROW-472 - [Python] Expose parquet::{SchemaDescriptor, ColumnDescriptor}::Equals ARROW-474 - Create an Arrow streaming file fomat ARROW-475 - [Python] High level support for reading directories of Parquet files (as a single Arrow table) from supported file system interfaces ARROW-476 - [Integration] Add integration tests for Binary / Varbytes type ARROW-477 - [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer ARROW-478 - [Python] Accept a PyBytes object in the pyarrow.io.BufferReader ctor ARROW-479 - Python: Test for expected schema in Pandas conversion ARROW-484 - Add more detail about what of technology can be found in the Arrow implementations to README ARROW-485 - [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe ARROW-490 - Python: Update manylinux1 build scripts ARROW-495 - [C++] Add C++ implementation of streaming serialized format ARROW-497 - [Java] Integration test harness for streaming format ARROW-498 - [C++] Integration test harness for streaming format ARROW-503 - [Python] Interface to streaming binary format ARROW-506 - Implement Arrow Echo server for integration testing ARROW-508 - [C++] Make file/memory-mapped file interfaces threadsafe ARROW-509 - [Python] Add support for PARQUET-835 (parallel column reads) ARROW-512 - C++: Add method to check for primitive types ARROW-514 - [Python] Accept pyarrow.io.Buffer as input to StreamReader, FileReader classes ARROW-515 - [Python] Add StreamReader/FileReader methods that read all record batches as a Table ARROW-521 - [C++/Python] Track peak memory use in default MemoryPool ARROW-524 - [java] provide apis to access nested vectors and buffers ARROW-525 - Python: Add more documentation to the package ARROW-527 - clean drill-module.conf file ARROW-529 - Python: Add jemalloc and Python 3.6 to manylinux1 build ARROW-531 - Python: Document jemalloc, extend Pandas section, add Getting Involved ARROW-538 - [C++] Set up AddressSanitizer (ASAN) builds ARROW-546 - Python: Account for changes in PARQUET-867 ARROW-547 - [Python] Expose Array::Slice and RecordBatch::Slice ARROW-553 - C++: Faster valid bitmap building ARROW-558 - Add KEYS files ARROW-81 - [Format] Add a Category logical type (distinct from dictionary-encoding) ARROW-96 - C++: API documentation using Doxygen ARROW-97 - Python: API documentation via sphinx-apidoc Bug Fixes ARROW-112 - [C++] Style fix for constants/enums ARROW-202 - [C++] Integrate with appveyor ci for windows support and get arrow building on windows ARROW-220 - [C++] Build conda artifacts in a build environment with better cross-linux ABI compatibility ARROW-224 - [C++] Address static linking of boost dependencies ARROW-230 - Python: Do not name modules like native ones (i.e. rename pyarrow.io) ARROW-239 - [Python] HdfsFile.read called with no arguments should read remainder of file ARROW-261 - [C++] Refactor BinaryArray/StringArray classes to not inherit from ListArray ARROW-275 - Add tests for UnionVector in Arrow File ARROW-294 - [C++] Do not use fopen / fclose / etc. methods for memory mapped file implementation ARROW-322 - [C++] Do not build HDFS IO interface optionally ARROW-323 - [Python] Opt-in to PyArrow parquet build rather than skipping silently on failure ARROW-334 - [Python] OS X rpath issues on some configurations ARROW-337 - UnionListWriter.list() is doing more than it should, this can cause data corruption ARROW-339 - Make merge_arrow_pr script work with Python 3 ARROW-340 - [C++] Opening a writeable file on disk that already exists does not truncate to zero ARROW-342 - Set Python version on release ARROW-345 - libhdfs integration doesn’t work for Mac ARROW-346 - Python API Documentation ARROW-348 - [Python] CMake build type should be configurable on the command line ARROW-349 - Six is missing as a requirement in the python setup.py ARROW-351 - Time type has no unit ARROW-354 - Connot compare an array of empty strings to another ARROW-357 - Default Parquet chunk_size of 64k is too small ARROW-358 - [C++] libhdfs can be in non-standard locations in some Hadoop distributions ARROW-362 - Python: Calling to_pandas on a table read from Parquet leaks memory ARROW-371 - Python: Table with null timestamp becomes float in pandas ARROW-375 - columns parameter in parquet.read_table() raises KeyError for valid column ARROW-384 - Align Java and C++ RecordBatch data and metadata layout ARROW-386 - [Java] Respect case of struct / map field names ARROW-387 - [C++] arrow::io::BufferReader does not permit shared memory ownership in zero-copy reads ARROW-390 - C++: CMake fails on json-integration-test with ARROW_BUILD_TESTS=OFF ARROW-392 - Fix string/binary integration tests ARROW-393 - [JAVA] JSON file reader fails to set the buffer size on String data vector ARROW-395 - Arrow file format writes record batches in reverse order. ARROW-398 - [Java] Java file format requires bitmaps of all 1’s to be written when there are no nulls ARROW-399 - [Java] ListVector.loadFieldBuffers ignores the ArrowFieldNode length metadata ARROW-400 - [Java] ArrowWriter writes length 0 for Struct types ARROW-401 - [Java] Floating point vectors should do an approximate comparison in integration tests ARROW-402 - [Java] “refCnt gone negative” error in integration tests ARROW-403 - [JAVA] UnionVector: Creating a transfer pair doesn’t transfer the schema to destination vector ARROW-404 - [Python] Closing an HdfsClient while there are still open file handles results in a crash ARROW-405 - [C++] Be less stringent about finding include/hdfs.h in HADOOP_HOME ARROW-406 - [C++] Large HDFS reads must utilize the set file buffer size when making RPCs ARROW-408 - [C++/Python] Remove defunct conda recipes ARROW-414 - [Java] “Buffer too large to resize to …” error ARROW-420 - Align Date implementation between Java and C++ ARROW-421 - [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a reference to the parent PyBytes to avoid premature garbage collection issues ARROW-422 - C++: IPC should depend on rapidjson_ep if RapidJSON is vendored ARROW-429 - git-archive SHA-256 checksums are changing ARROW-433 - [Python] Date conversion is locale-dependent ARROW-434 - Segfaults and encoding issues in Python Parquet reads ARROW-435 - C++: Spelling mistake in if(RAPIDJSON_VENDORED) ARROW-437 - [C++] clang compiler warnings from overridden virtual functions ARROW-445 - C++: arrow_ipc is built before arrow/ipc/Message_generated.h was generated ARROW-447 - Python: Align scalar/pylist string encoding with pandas’ one. ARROW-455 - [C++] BufferOutputStream dtor does not call Close() ARROW-469 - C++: Add option so that resize doesn’t decrease the capacity ARROW-481 - [Python] Fix Python 2.7 regression in patch for PARQUET-472 ARROW-486 - [C++] arrow::io::MemoryMappedFile can’t be casted to arrow::io::FileInterface ARROW-487 - Python: ConvertTableToPandas segfaults if ObjectBlock::Write fails ARROW-494 - [C++] When MemoryMappedFile is destructed, memory is unmapped even if buffer referecnes still exist ARROW-499 - Update file serialization to use streaming serialization format ARROW-505 - [C++] Fix compiler warnings in release mode ARROW-511 - [Python] List[T] conversions not implemented for single arrays ARROW-513 - [C++] Fix Appveyor build ARROW-519 - [C++] Missing vtable in libarrow.dylib on Xcode 6.4 ARROW-523 - Python: Account for changes in PARQUET-834 ARROW-533 - [C++] arrow::TimestampArray / TimeArray has a broken constructor ARROW-535 - [Python] Add type mapping for NPY_LONGLONG ARROW-537 - [C++] StringArray/BinaryArray comparisons may be incorrect when values with non-zero length are null ARROW-540 - [C++] Fix build in aftermath of ARROW-33 ARROW-543 - C++: Lazily computed null_counts counts number of non-null entries ARROW-544 - [C++] ArrayLoader::LoadBinary fails for length-0 arrays ARROW-545 - [Python] Ignore files without .parq or .parquet prefix when reading directory of files ARROW-548 - [Python] Add nthreads option to pyarrow.Filesystem.read_parquet ARROW-551 - C++: Construction of Column with nullptr Array segfaults ARROW-556 - [Integration] Can not run Integration tests if different cpp build path ARROW-561 - Update java &amp; python dependencies to improve downstream packaging experience" />
<meta property="og:description" content="Apache Arrow 0.2.0 (18 February 2017) Download Source Artifacts Git tag Changelog Contributors $ git shortlog -sn apache-arrow-0.1.0..apache-arrow-0.2.0 73 Wes McKinney 55 Uwe L. Korn 16 Julien Le Dem 4 Bryan Cutler 4 Nong Li 2 Christopher C. Aycock 2 Jingyuan Wang 2 Kouhei Sutou 2 Laurent Goujon 2 Leif Walsh 1 Emilio Lahr-Vivaz 1 Holden Karau 1 Li Jin 1 Mohamed Zenadi 1 Peter Hoffmann 1 Steven Phillips 1 adeneche 1 ahnj 1 vkorukanti New Features and Improvements ARROW-108 - [C++] Add IPC round trip for union types ARROW-189 - C++: Use ExternalProject to build thirdparty dependencies ARROW-191 - Python: Provide infrastructure for manylinux1 wheels ARROW-221 - Add switch for writing Parquet 1.0 compatible logical types ARROW-227 - [C++/Python] Hook arrow_io generic reader / writer interface into arrow_parquet ARROW-228 - [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects ARROW-243 - [C++] Add “driver” option to HdfsClient to choose between libhdfs and libhdfs3 at runtime ARROW-268 - [C++] Flesh out union implementation to have all required methods for IPC ARROW-303 - [C++] Also build static libraries for leaf libraries ARROW-312 - [Python] Provide Python API to read/write the Arrow IPC file format ARROW-317 - [C++] Implement zero-copy Slice method on arrow::Buffer that retains reference to parent ARROW-327 - [Python] Remove conda builds from Travis CI processes ARROW-328 - [C++] Return shared_ptr by value instead of const-ref? ARROW-33 - C++: Implement zero-copy array slicing ARROW-330 - [C++] CMake functions to simplify shared / static library configuration ARROW-332 - [Python] Add helper function to convert RecordBatch to pandas.DataFrame ARROW-333 - Make writers update their internal schema even when no data is written. ARROW-335 - Improve Type apis and toString() by encapsulating flatbuffers better ARROW-336 - Run Apache Rat in Travis builds ARROW-338 - [C++] Refactor IPC vector “loading” and “unloading” to be based on cleaner visitor pattern ARROW-350 - Add Kerberos support to HDFS shim ARROW-353 - Arrow release 0.2 ARROW-355 - Add tests for serialising arrays of empty strings to Parquet ARROW-356 - Add documentation about reading Parquet ARROW-359 - Need to document ARROW_LIBHDFS_DIR ARROW-360 - C++: Add method to shrink PoolBuffer using realloc ARROW-361 - Python: Support reading a column-selection from Parquet files ARROW-363 - Set up Java/C++ integration test harness ARROW-365 - Python: Provide Array.to_pandas() ARROW-366 - [java] implement Dictionary vector ARROW-367 - [java] converter csv/json &lt;=&gt; Arrow file format for Integration tests ARROW-368 - Document use of LD_LIBRARY_PATH when using Python ARROW-369 - [Python] Add ability to convert multiple record batches at once to pandas ARROW-372 - Create JSON arrow file format for integration tests ARROW-373 - [C++] Implement C++ version of JSON file format for testing ARROW-374 - Python: clarify unicode vs. binary in API ARROW-377 - Python: Add support for conversion of Pandas.Categorical ARROW-379 - Python: Use setuptools_scm/setuptools_scm_git_archive to provide the version number ARROW-380 - [Java] optimize null count when serializing vectors. ARROW-381 - [C++] Simplify primitive array type builders to use a default type singleton ARROW-382 - Python: Extend API documentation ARROW-383 - [C++] Implement C++ version of ARROW-367 integration test validator ARROW-389 - Python: Write Parquet files to pyarrow.io.NativeFile objects ARROW-394 - Add integration tests for boolean, list, struct, and other basic types ARROW-396 - Python: Add pyarrow.schema.Schema.equals ARROW-409 - Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead ARROW-410 - [C++] Add Flush method to arrow::io::OutputStream ARROW-411 - [Java] Move Intergration.compare and Intergration.compareSchemas to a public utils class ARROW-415 - C++: Add Equals implementation to compare Tables ARROW-416 - C++: Add Equals implementation to compare Columns ARROW-417 - C++: Add Equals implementation to compare ChunkedArrays ARROW-418 - [C++] Consolidate array container and builder code, remove arrow/types ARROW-419 - [C++] Promote util/{status.h, buffer.h, memory-pool.h} to top level of arrow/ source directory ARROW-423 - C++: Define BUILD_BYPRODUCTS in external project to support non-make CMake generators ARROW-425 - Python: Expose a C function to convert arrow::Table to pyarrow.Table ARROW-426 - Python: Conversion from pyarrow.Array to a Python list ARROW-427 - [C++] Implement dictionary-encoded array container ARROW-428 - [Python] Deserialize from Arrow record batches to pandas in parallel using a thread pool ARROW-430 - Python: Better version handling ARROW-432 - [Python] Avoid unnecessary memory copy in to_pandas conversion by using low-level pandas internals APIs ARROW-438 - [Python] Concatenate Table instances with equal schemas ARROW-440 - [C++] Support pkg-config ARROW-441 - [Python] Expose Arrow’s file and memory map classes as NativeFile subclasses ARROW-442 - [Python] Add public Python API to inspect Parquet file metadata ARROW-444 - [Python] Avoid unnecessary memory copies from use of PyBytes_* C APIs ARROW-449 - Python: Conversion from pyarrow.{Table,RecordBatch} to a Python dict ARROW-450 - Python: Fixes for PARQUET-818 ARROW-456 - C++: Add jemalloc based MemoryPool ARROW-457 - Python: Better control over memory pool ARROW-458 - Python: Expose jemalloc MemoryPool ARROW-461 - [Python] Implement conversion between arrow::DictionaryArray and pandas.Categorical ARROW-463 - C++: Support jemalloc 4.x ARROW-466 - C++: ExternalProject for jemalloc ARROW-467 - [Python] Run parquet-cpp unit tests in Travis CI ARROW-468 - Python: Conversion of nested data in pd.DataFrames to/from Arrow structures ARROW-470 - [Python] Add “FileSystem” abstraction to access directories of files in a uniform way ARROW-471 - [Python] Enable ParquetFile to pass down separately-obtained file metadata ARROW-472 - [Python] Expose parquet::{SchemaDescriptor, ColumnDescriptor}::Equals ARROW-474 - Create an Arrow streaming file fomat ARROW-475 - [Python] High level support for reading directories of Parquet files (as a single Arrow table) from supported file system interfaces ARROW-476 - [Integration] Add integration tests for Binary / Varbytes type ARROW-477 - [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer ARROW-478 - [Python] Accept a PyBytes object in the pyarrow.io.BufferReader ctor ARROW-479 - Python: Test for expected schema in Pandas conversion ARROW-484 - Add more detail about what of technology can be found in the Arrow implementations to README ARROW-485 - [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe ARROW-490 - Python: Update manylinux1 build scripts ARROW-495 - [C++] Add C++ implementation of streaming serialized format ARROW-497 - [Java] Integration test harness for streaming format ARROW-498 - [C++] Integration test harness for streaming format ARROW-503 - [Python] Interface to streaming binary format ARROW-506 - Implement Arrow Echo server for integration testing ARROW-508 - [C++] Make file/memory-mapped file interfaces threadsafe ARROW-509 - [Python] Add support for PARQUET-835 (parallel column reads) ARROW-512 - C++: Add method to check for primitive types ARROW-514 - [Python] Accept pyarrow.io.Buffer as input to StreamReader, FileReader classes ARROW-515 - [Python] Add StreamReader/FileReader methods that read all record batches as a Table ARROW-521 - [C++/Python] Track peak memory use in default MemoryPool ARROW-524 - [java] provide apis to access nested vectors and buffers ARROW-525 - Python: Add more documentation to the package ARROW-527 - clean drill-module.conf file ARROW-529 - Python: Add jemalloc and Python 3.6 to manylinux1 build ARROW-531 - Python: Document jemalloc, extend Pandas section, add Getting Involved ARROW-538 - [C++] Set up AddressSanitizer (ASAN) builds ARROW-546 - Python: Account for changes in PARQUET-867 ARROW-547 - [Python] Expose Array::Slice and RecordBatch::Slice ARROW-553 - C++: Faster valid bitmap building ARROW-558 - Add KEYS files ARROW-81 - [Format] Add a Category logical type (distinct from dictionary-encoding) ARROW-96 - C++: API documentation using Doxygen ARROW-97 - Python: API documentation via sphinx-apidoc Bug Fixes ARROW-112 - [C++] Style fix for constants/enums ARROW-202 - [C++] Integrate with appveyor ci for windows support and get arrow building on windows ARROW-220 - [C++] Build conda artifacts in a build environment with better cross-linux ABI compatibility ARROW-224 - [C++] Address static linking of boost dependencies ARROW-230 - Python: Do not name modules like native ones (i.e. rename pyarrow.io) ARROW-239 - [Python] HdfsFile.read called with no arguments should read remainder of file ARROW-261 - [C++] Refactor BinaryArray/StringArray classes to not inherit from ListArray ARROW-275 - Add tests for UnionVector in Arrow File ARROW-294 - [C++] Do not use fopen / fclose / etc. methods for memory mapped file implementation ARROW-322 - [C++] Do not build HDFS IO interface optionally ARROW-323 - [Python] Opt-in to PyArrow parquet build rather than skipping silently on failure ARROW-334 - [Python] OS X rpath issues on some configurations ARROW-337 - UnionListWriter.list() is doing more than it should, this can cause data corruption ARROW-339 - Make merge_arrow_pr script work with Python 3 ARROW-340 - [C++] Opening a writeable file on disk that already exists does not truncate to zero ARROW-342 - Set Python version on release ARROW-345 - libhdfs integration doesn’t work for Mac ARROW-346 - Python API Documentation ARROW-348 - [Python] CMake build type should be configurable on the command line ARROW-349 - Six is missing as a requirement in the python setup.py ARROW-351 - Time type has no unit ARROW-354 - Connot compare an array of empty strings to another ARROW-357 - Default Parquet chunk_size of 64k is too small ARROW-358 - [C++] libhdfs can be in non-standard locations in some Hadoop distributions ARROW-362 - Python: Calling to_pandas on a table read from Parquet leaks memory ARROW-371 - Python: Table with null timestamp becomes float in pandas ARROW-375 - columns parameter in parquet.read_table() raises KeyError for valid column ARROW-384 - Align Java and C++ RecordBatch data and metadata layout ARROW-386 - [Java] Respect case of struct / map field names ARROW-387 - [C++] arrow::io::BufferReader does not permit shared memory ownership in zero-copy reads ARROW-390 - C++: CMake fails on json-integration-test with ARROW_BUILD_TESTS=OFF ARROW-392 - Fix string/binary integration tests ARROW-393 - [JAVA] JSON file reader fails to set the buffer size on String data vector ARROW-395 - Arrow file format writes record batches in reverse order. ARROW-398 - [Java] Java file format requires bitmaps of all 1’s to be written when there are no nulls ARROW-399 - [Java] ListVector.loadFieldBuffers ignores the ArrowFieldNode length metadata ARROW-400 - [Java] ArrowWriter writes length 0 for Struct types ARROW-401 - [Java] Floating point vectors should do an approximate comparison in integration tests ARROW-402 - [Java] “refCnt gone negative” error in integration tests ARROW-403 - [JAVA] UnionVector: Creating a transfer pair doesn’t transfer the schema to destination vector ARROW-404 - [Python] Closing an HdfsClient while there are still open file handles results in a crash ARROW-405 - [C++] Be less stringent about finding include/hdfs.h in HADOOP_HOME ARROW-406 - [C++] Large HDFS reads must utilize the set file buffer size when making RPCs ARROW-408 - [C++/Python] Remove defunct conda recipes ARROW-414 - [Java] “Buffer too large to resize to …” error ARROW-420 - Align Date implementation between Java and C++ ARROW-421 - [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a reference to the parent PyBytes to avoid premature garbage collection issues ARROW-422 - C++: IPC should depend on rapidjson_ep if RapidJSON is vendored ARROW-429 - git-archive SHA-256 checksums are changing ARROW-433 - [Python] Date conversion is locale-dependent ARROW-434 - Segfaults and encoding issues in Python Parquet reads ARROW-435 - C++: Spelling mistake in if(RAPIDJSON_VENDORED) ARROW-437 - [C++] clang compiler warnings from overridden virtual functions ARROW-445 - C++: arrow_ipc is built before arrow/ipc/Message_generated.h was generated ARROW-447 - Python: Align scalar/pylist string encoding with pandas’ one. ARROW-455 - [C++] BufferOutputStream dtor does not call Close() ARROW-469 - C++: Add option so that resize doesn’t decrease the capacity ARROW-481 - [Python] Fix Python 2.7 regression in patch for PARQUET-472 ARROW-486 - [C++] arrow::io::MemoryMappedFile can’t be casted to arrow::io::FileInterface ARROW-487 - Python: ConvertTableToPandas segfaults if ObjectBlock::Write fails ARROW-494 - [C++] When MemoryMappedFile is destructed, memory is unmapped even if buffer referecnes still exist ARROW-499 - Update file serialization to use streaming serialization format ARROW-505 - [C++] Fix compiler warnings in release mode ARROW-511 - [Python] List[T] conversions not implemented for single arrays ARROW-513 - [C++] Fix Appveyor build ARROW-519 - [C++] Missing vtable in libarrow.dylib on Xcode 6.4 ARROW-523 - Python: Account for changes in PARQUET-834 ARROW-533 - [C++] arrow::TimestampArray / TimeArray has a broken constructor ARROW-535 - [Python] Add type mapping for NPY_LONGLONG ARROW-537 - [C++] StringArray/BinaryArray comparisons may be incorrect when values with non-zero length are null ARROW-540 - [C++] Fix build in aftermath of ARROW-33 ARROW-543 - C++: Lazily computed null_counts counts number of non-null entries ARROW-544 - [C++] ArrayLoader::LoadBinary fails for length-0 arrays ARROW-545 - [Python] Ignore files without .parq or .parquet prefix when reading directory of files ARROW-548 - [Python] Add nthreads option to pyarrow.Filesystem.read_parquet ARROW-551 - C++: Construction of Column with nullptr Array segfaults ARROW-556 - [Integration] Can not run Integration tests if different cpp build path ARROW-561 - Update java &amp; python dependencies to improve downstream packaging experience" />
<link rel="canonical" href="https://arrow.apache.org/release/0.2.0.html" />
<meta property="og:url" content="https://arrow.apache.org/release/0.2.0.html" />
<meta property="og:site_name" content="Apache Arrow" />
<meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
<meta property="og:type" content="article" />
<meta property="article:published_time" content="2024-04-29T17:30:49-04:00" />
<meta name="twitter:card" content="summary_large_image" />
<meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
<meta property="twitter:title" content="0.2.0 Release" />
<meta name="twitter:site" content="@ApacheArrow" />
<script type="application/ld+json">
{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2024-04-29T17:30:49-04:00","datePublished":"2024-04-29T17:30:49-04:00","description":"Apache Arrow 0.2.0 (18 February 2017) Download Source Artifacts Git tag Changelog Contributors $ git shortlog -sn apache-arrow-0.1.0..apache-arrow-0.2.0 73 Wes McKinney 55 Uwe L. Korn 16 Julien Le Dem 4 Bryan Cutler 4 Nong Li 2 Christopher C. Aycock 2 Jingyuan Wang 2 Kouhei Sutou 2 Laurent Goujon 2 Leif Walsh 1 Emilio Lahr-Vivaz 1 Holden Karau 1 Li Jin 1 Mohamed Zenadi 1 Peter Hoffmann 1 Steven Phillips 1 adeneche 1 ahnj 1 vkorukanti New Features and Improvements ARROW-108 - [C++] Add IPC round trip for union types ARROW-189 - C++: Use ExternalProject to build thirdparty dependencies ARROW-191 - Python: Provide infrastructure for manylinux1 wheels ARROW-221 - Add switch for writing Parquet 1.0 compatible logical types ARROW-227 - [C++/Python] Hook arrow_io generic reader / writer interface into arrow_parquet ARROW-228 - [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects ARROW-243 - [C++] Add “driver” option to HdfsClient to choose between libhdfs and libhdfs3 at runtime ARROW-268 - [C++] Flesh out union implementation to have all required methods for IPC ARROW-303 - [C++] Also build static libraries for leaf libraries ARROW-312 - [Python] Provide Python API to read/write the Arrow IPC file format ARROW-317 - [C++] Implement zero-copy Slice method on arrow::Buffer that retains reference to parent ARROW-327 - [Python] Remove conda builds from Travis CI processes ARROW-328 - [C++] Return shared_ptr by value instead of const-ref? ARROW-33 - C++: Implement zero-copy array slicing ARROW-330 - [C++] CMake functions to simplify shared / static library configuration ARROW-332 - [Python] Add helper function to convert RecordBatch to pandas.DataFrame ARROW-333 - Make writers update their internal schema even when no data is written. ARROW-335 - Improve Type apis and toString() by encapsulating flatbuffers better ARROW-336 - Run Apache Rat in Travis builds ARROW-338 - [C++] Refactor IPC vector “loading” and “unloading” to be based on cleaner visitor pattern ARROW-350 - Add Kerberos support to HDFS shim ARROW-353 - Arrow release 0.2 ARROW-355 - Add tests for serialising arrays of empty strings to Parquet ARROW-356 - Add documentation about reading Parquet ARROW-359 - Need to document ARROW_LIBHDFS_DIR ARROW-360 - C++: Add method to shrink PoolBuffer using realloc ARROW-361 - Python: Support reading a column-selection from Parquet files ARROW-363 - Set up Java/C++ integration test harness ARROW-365 - Python: Provide Array.to_pandas() ARROW-366 - [java] implement Dictionary vector ARROW-367 - [java] converter csv/json &lt;=&gt; Arrow file format for Integration tests ARROW-368 - Document use of LD_LIBRARY_PATH when using Python ARROW-369 - [Python] Add ability to convert multiple record batches at once to pandas ARROW-372 - Create JSON arrow file format for integration tests ARROW-373 - [C++] Implement C++ version of JSON file format for testing ARROW-374 - Python: clarify unicode vs. binary in API ARROW-377 - Python: Add support for conversion of Pandas.Categorical ARROW-379 - Python: Use setuptools_scm/setuptools_scm_git_archive to provide the version number ARROW-380 - [Java] optimize null count when serializing vectors. ARROW-381 - [C++] Simplify primitive array type builders to use a default type singleton ARROW-382 - Python: Extend API documentation ARROW-383 - [C++] Implement C++ version of ARROW-367 integration test validator ARROW-389 - Python: Write Parquet files to pyarrow.io.NativeFile objects ARROW-394 - Add integration tests for boolean, list, struct, and other basic types ARROW-396 - Python: Add pyarrow.schema.Schema.equals ARROW-409 - Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead ARROW-410 - [C++] Add Flush method to arrow::io::OutputStream ARROW-411 - [Java] Move Intergration.compare and Intergration.compareSchemas to a public utils class ARROW-415 - C++: Add Equals implementation to compare Tables ARROW-416 - C++: Add Equals implementation to compare Columns ARROW-417 - C++: Add Equals implementation to compare ChunkedArrays ARROW-418 - [C++] Consolidate array container and builder code, remove arrow/types ARROW-419 - [C++] Promote util/{status.h, buffer.h, memory-pool.h} to top level of arrow/ source directory ARROW-423 - C++: Define BUILD_BYPRODUCTS in external project to support non-make CMake generators ARROW-425 - Python: Expose a C function to convert arrow::Table to pyarrow.Table ARROW-426 - Python: Conversion from pyarrow.Array to a Python list ARROW-427 - [C++] Implement dictionary-encoded array container ARROW-428 - [Python] Deserialize from Arrow record batches to pandas in parallel using a thread pool ARROW-430 - Python: Better version handling ARROW-432 - [Python] Avoid unnecessary memory copy in to_pandas conversion by using low-level pandas internals APIs ARROW-438 - [Python] Concatenate Table instances with equal schemas ARROW-440 - [C++] Support pkg-config ARROW-441 - [Python] Expose Arrow’s file and memory map classes as NativeFile subclasses ARROW-442 - [Python] Add public Python API to inspect Parquet file metadata ARROW-444 - [Python] Avoid unnecessary memory copies from use of PyBytes_* C APIs ARROW-449 - Python: Conversion from pyarrow.{Table,RecordBatch} to a Python dict ARROW-450 - Python: Fixes for PARQUET-818 ARROW-456 - C++: Add jemalloc based MemoryPool ARROW-457 - Python: Better control over memory pool ARROW-458 - Python: Expose jemalloc MemoryPool ARROW-461 - [Python] Implement conversion between arrow::DictionaryArray and pandas.Categorical ARROW-463 - C++: Support jemalloc 4.x ARROW-466 - C++: ExternalProject for jemalloc ARROW-467 - [Python] Run parquet-cpp unit tests in Travis CI ARROW-468 - Python: Conversion of nested data in pd.DataFrames to/from Arrow structures ARROW-470 - [Python] Add “FileSystem” abstraction to access directories of files in a uniform way ARROW-471 - [Python] Enable ParquetFile to pass down separately-obtained file metadata ARROW-472 - [Python] Expose parquet::{SchemaDescriptor, ColumnDescriptor}::Equals ARROW-474 - Create an Arrow streaming file fomat ARROW-475 - [Python] High level support for reading directories of Parquet files (as a single Arrow table) from supported file system interfaces ARROW-476 - [Integration] Add integration tests for Binary / Varbytes type ARROW-477 - [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer ARROW-478 - [Python] Accept a PyBytes object in the pyarrow.io.BufferReader ctor ARROW-479 - Python: Test for expected schema in Pandas conversion ARROW-484 - Add more detail about what of technology can be found in the Arrow implementations to README ARROW-485 - [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe ARROW-490 - Python: Update manylinux1 build scripts ARROW-495 - [C++] Add C++ implementation of streaming serialized format ARROW-497 - [Java] Integration test harness for streaming format ARROW-498 - [C++] Integration test harness for streaming format ARROW-503 - [Python] Interface to streaming binary format ARROW-506 - Implement Arrow Echo server for integration testing ARROW-508 - [C++] Make file/memory-mapped file interfaces threadsafe ARROW-509 - [Python] Add support for PARQUET-835 (parallel column reads) ARROW-512 - C++: Add method to check for primitive types ARROW-514 - [Python] Accept pyarrow.io.Buffer as input to StreamReader, FileReader classes ARROW-515 - [Python] Add StreamReader/FileReader methods that read all record batches as a Table ARROW-521 - [C++/Python] Track peak memory use in default MemoryPool ARROW-524 - [java] provide apis to access nested vectors and buffers ARROW-525 - Python: Add more documentation to the package ARROW-527 - clean drill-module.conf file ARROW-529 - Python: Add jemalloc and Python 3.6 to manylinux1 build ARROW-531 - Python: Document jemalloc, extend Pandas section, add Getting Involved ARROW-538 - [C++] Set up AddressSanitizer (ASAN) builds ARROW-546 - Python: Account for changes in PARQUET-867 ARROW-547 - [Python] Expose Array::Slice and RecordBatch::Slice ARROW-553 - C++: Faster valid bitmap building ARROW-558 - Add KEYS files ARROW-81 - [Format] Add a Category logical type (distinct from dictionary-encoding) ARROW-96 - C++: API documentation using Doxygen ARROW-97 - Python: API documentation via sphinx-apidoc Bug Fixes ARROW-112 - [C++] Style fix for constants/enums ARROW-202 - [C++] Integrate with appveyor ci for windows support and get arrow building on windows ARROW-220 - [C++] Build conda artifacts in a build environment with better cross-linux ABI compatibility ARROW-224 - [C++] Address static linking of boost dependencies ARROW-230 - Python: Do not name modules like native ones (i.e. rename pyarrow.io) ARROW-239 - [Python] HdfsFile.read called with no arguments should read remainder of file ARROW-261 - [C++] Refactor BinaryArray/StringArray classes to not inherit from ListArray ARROW-275 - Add tests for UnionVector in Arrow File ARROW-294 - [C++] Do not use fopen / fclose / etc. methods for memory mapped file implementation ARROW-322 - [C++] Do not build HDFS IO interface optionally ARROW-323 - [Python] Opt-in to PyArrow parquet build rather than skipping silently on failure ARROW-334 - [Python] OS X rpath issues on some configurations ARROW-337 - UnionListWriter.list() is doing more than it should, this can cause data corruption ARROW-339 - Make merge_arrow_pr script work with Python 3 ARROW-340 - [C++] Opening a writeable file on disk that already exists does not truncate to zero ARROW-342 - Set Python version on release ARROW-345 - libhdfs integration doesn’t work for Mac ARROW-346 - Python API Documentation ARROW-348 - [Python] CMake build type should be configurable on the command line ARROW-349 - Six is missing as a requirement in the python setup.py ARROW-351 - Time type has no unit ARROW-354 - Connot compare an array of empty strings to another ARROW-357 - Default Parquet chunk_size of 64k is too small ARROW-358 - [C++] libhdfs can be in non-standard locations in some Hadoop distributions ARROW-362 - Python: Calling to_pandas on a table read from Parquet leaks memory ARROW-371 - Python: Table with null timestamp becomes float in pandas ARROW-375 - columns parameter in parquet.read_table() raises KeyError for valid column ARROW-384 - Align Java and C++ RecordBatch data and metadata layout ARROW-386 - [Java] Respect case of struct / map field names ARROW-387 - [C++] arrow::io::BufferReader does not permit shared memory ownership in zero-copy reads ARROW-390 - C++: CMake fails on json-integration-test with ARROW_BUILD_TESTS=OFF ARROW-392 - Fix string/binary integration tests ARROW-393 - [JAVA] JSON file reader fails to set the buffer size on String data vector ARROW-395 - Arrow file format writes record batches in reverse order. ARROW-398 - [Java] Java file format requires bitmaps of all 1’s to be written when there are no nulls ARROW-399 - [Java] ListVector.loadFieldBuffers ignores the ArrowFieldNode length metadata ARROW-400 - [Java] ArrowWriter writes length 0 for Struct types ARROW-401 - [Java] Floating point vectors should do an approximate comparison in integration tests ARROW-402 - [Java] “refCnt gone negative” error in integration tests ARROW-403 - [JAVA] UnionVector: Creating a transfer pair doesn’t transfer the schema to destination vector ARROW-404 - [Python] Closing an HdfsClient while there are still open file handles results in a crash ARROW-405 - [C++] Be less stringent about finding include/hdfs.h in HADOOP_HOME ARROW-406 - [C++] Large HDFS reads must utilize the set file buffer size when making RPCs ARROW-408 - [C++/Python] Remove defunct conda recipes ARROW-414 - [Java] “Buffer too large to resize to …” error ARROW-420 - Align Date implementation between Java and C++ ARROW-421 - [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a reference to the parent PyBytes to avoid premature garbage collection issues ARROW-422 - C++: IPC should depend on rapidjson_ep if RapidJSON is vendored ARROW-429 - git-archive SHA-256 checksums are changing ARROW-433 - [Python] Date conversion is locale-dependent ARROW-434 - Segfaults and encoding issues in Python Parquet reads ARROW-435 - C++: Spelling mistake in if(RAPIDJSON_VENDORED) ARROW-437 - [C++] clang compiler warnings from overridden virtual functions ARROW-445 - C++: arrow_ipc is built before arrow/ipc/Message_generated.h was generated ARROW-447 - Python: Align scalar/pylist string encoding with pandas’ one. ARROW-455 - [C++] BufferOutputStream dtor does not call Close() ARROW-469 - C++: Add option so that resize doesn’t decrease the capacity ARROW-481 - [Python] Fix Python 2.7 regression in patch for PARQUET-472 ARROW-486 - [C++] arrow::io::MemoryMappedFile can’t be casted to arrow::io::FileInterface ARROW-487 - Python: ConvertTableToPandas segfaults if ObjectBlock::Write fails ARROW-494 - [C++] When MemoryMappedFile is destructed, memory is unmapped even if buffer referecnes still exist ARROW-499 - Update file serialization to use streaming serialization format ARROW-505 - [C++] Fix compiler warnings in release mode ARROW-511 - [Python] List[T] conversions not implemented for single arrays ARROW-513 - [C++] Fix Appveyor build ARROW-519 - [C++] Missing vtable in libarrow.dylib on Xcode 6.4 ARROW-523 - Python: Account for changes in PARQUET-834 ARROW-533 - [C++] arrow::TimestampArray / TimeArray has a broken constructor ARROW-535 - [Python] Add type mapping for NPY_LONGLONG ARROW-537 - [C++] StringArray/BinaryArray comparisons may be incorrect when values with non-zero length are null ARROW-540 - [C++] Fix build in aftermath of ARROW-33 ARROW-543 - C++: Lazily computed null_counts counts number of non-null entries ARROW-544 - [C++] ArrayLoader::LoadBinary fails for length-0 arrays ARROW-545 - [Python] Ignore files without .parq or .parquet prefix when reading directory of files ARROW-548 - [Python] Add nthreads option to pyarrow.Filesystem.read_parquet ARROW-551 - C++: Construction of Column with nullptr Array segfaults ARROW-556 - [Integration] Can not run Integration tests if different cpp build path ARROW-561 - Update java &amp; python dependencies to improve downstream packaging experience","headline":"0.2.0 Release","image":"https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png","mainEntityOfPage":{"@type":"WebPage","@id":"https://arrow.apache.org/release/0.2.0.html"},"publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://arrow.apache.org/img/logo.png"}},"url":"https://arrow.apache.org/release/0.2.0.html"}</script>
<!-- End Jekyll SEO tag -->
<!-- favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png" id="light1">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png" id="light2">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon.png" id="light3">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120.png" id="light4">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76.png" id="light5">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60.png" id="light6">
<!-- dark mode favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16-dark.png" id="dark1">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32-dark.png" id="dark2">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon-dark.png" id="dark3">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120-dark.png" id="dark4">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76-dark.png" id="dark5">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60-dark.png" id="dark6">
<script>
// Switch to the dark-mode favicons if prefers-color-scheme: dark
function onUpdate() {
light1 = document.querySelector('link#light1');
light2 = document.querySelector('link#light2');
light3 = document.querySelector('link#light3');
light4 = document.querySelector('link#light4');
light5 = document.querySelector('link#light5');
light6 = document.querySelector('link#light6');
dark1 = document.querySelector('link#dark1');
dark2 = document.querySelector('link#dark2');
dark3 = document.querySelector('link#dark3');
dark4 = document.querySelector('link#dark4');
dark5 = document.querySelector('link#dark5');
dark6 = document.querySelector('link#dark6');
if (matcher.matches) {
light1.remove();
light2.remove();
light3.remove();
light4.remove();
light5.remove();
light6.remove();
document.head.append(dark1);
document.head.append(dark2);
document.head.append(dark3);
document.head.append(dark4);
document.head.append(dark5);
document.head.append(dark6);
} else {
dark1.remove();
dark2.remove();
dark3.remove();
dark4.remove();
dark5.remove();
dark6.remove();
document.head.append(light1);
document.head.append(light2);
document.head.append(light3);
document.head.append(light4);
document.head.append(light5);
document.head.append(light6);
}
}
matcher = window.matchMedia('(prefers-color-scheme: dark)');
matcher.addListener(onUpdate);
onUpdate();
</script>
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
<link href="/css/main.css" rel="stylesheet">
<link href="/css/syntax.css" rel="stylesheet">
<script src="/javascript/main.js"></script>
<!-- Matomo -->
<script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
/* We explicitly disable cookie tracking to avoid privacy issues */
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '20']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<!-- End Matomo Code -->
</head>
<body class="wrap">
<header>
<nav class="navbar navbar-expand-md navbar-dark bg-dark">
<a class="navbar-brand no-padding" href="/"><img src="/img/arrow-inverse-300px.png" height="40px"/></a>
<button class="navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse justify-content-end" id="arrow-navbar">
<ul class="nav navbar-nav">
<li class="nav-item"><a class="nav-link" href="/overview/" role="button" aria-haspopup="true" aria-expanded="false">Overview</a></li>
<li class="nav-item"><a class="nav-link" href="/faq/" role="button" aria-haspopup="true" aria-expanded="false">FAQ</a></li>
<li class="nav-item"><a class="nav-link" href="/blog" role="button" aria-haspopup="true" aria-expanded="false">Blog</a></li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#"
id="navbarDropdownGetArrow" role="button" data-toggle="dropdown"
aria-haspopup="true" aria-expanded="false">
Get Arrow
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownGetArrow">
<a class="dropdown-item" href="/install/">Install</a>
<a class="dropdown-item" href="/release/">Releases</a>
<a class="dropdown-item" href="https://github.com/apache/arrow">Source Code</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#"
id="navbarDropdownDocumentation" role="button" data-toggle="dropdown"
aria-haspopup="true" aria-expanded="false">
Documentation
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation">
<a class="dropdown-item" href="/docs">Project Docs</a>
<a class="dropdown-item" href="/docs/format/Columnar.html">Format</a>
<hr/>
<a class="dropdown-item" href="/docs/c_glib">C GLib</a>
<a class="dropdown-item" href="/docs/cpp">C++</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/csharp/README.md">C#</a>
<a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>
<a class="dropdown-item" href="/docs/java">Java</a>
<a class="dropdown-item" href="/docs/js">JavaScript</a>
<a class="dropdown-item" href="/julia/">Julia</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/matlab/README.md">MATLAB</a>
<a class="dropdown-item" href="/docs/python">Python</a>
<a class="dropdown-item" href="/docs/r">R</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/ruby/README.md">Ruby</a>
<a class="dropdown-item" href="https://docs.rs/arrow/latest">Rust</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#"
id="navbarDropdownSubprojects" role="button" data-toggle="dropdown"
aria-haspopup="true" aria-expanded="false">
Subprojects
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownSubprojects">
<a class="dropdown-item" href="/adbc">ADBC</a>
<a class="dropdown-item" href="/docs/format/Flight.html">Arrow Flight</a>
<a class="dropdown-item" href="/docs/format/FlightSql.html">Arrow Flight SQL</a>
<a class="dropdown-item" href="https://datafusion.apache.org">DataFusion</a>
<a class="dropdown-item" href="/nanoarrow">nanoarrow</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#"
id="navbarDropdownCommunity" role="button" data-toggle="dropdown"
aria-haspopup="true" aria-expanded="false">
Community
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
<a class="dropdown-item" href="/community/">Communication</a>
<a class="dropdown-item" href="/docs/developers/index.html">Contributing</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/issues">Issue Tracker</a>
<a class="dropdown-item" href="/committers/">Governance</a>
<a class="dropdown-item" href="/use_cases/">Use Cases</a>
<a class="dropdown-item" href="/powered_by/">Powered By</a>
<a class="dropdown-item" href="/visual_identity/">Visual Identity</a>
<a class="dropdown-item" href="/security/">Security</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/policies/conduct.html">Code of Conduct</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#"
id="navbarDropdownASF" role="button" data-toggle="dropdown"
aria-haspopup="true" aria-expanded="false">
ASF Links
</a>
<div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdownASF">
<a class="dropdown-item" href="https://www.apache.org/">ASF Website</a>
<a class="dropdown-item" href="https://www.apache.org/licenses/">License</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html">Donate</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html">Thanks</a>
<a class="dropdown-item" href="https://www.apache.org/security/">Security</a>
</div>
</li>
</ul>
</div><!-- /.navbar-collapse -->
</nav>
</header>
<div class="container p-4 pt-5">
<main role="main" class="pb-5">
<!--
-->
<h1 id="apache-arrow-020-18-february-2017">Apache Arrow 0.2.0 (18 February 2017)</h1>
<h2 id="download">Download</h2>
<ul>
<li><a href="https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.2.0/"><strong>Source Artifacts</strong></a></li>
<li><a href="https://github.com/apache/arrow/releases/tag/apache-arrow-0.2.0">Git tag</a></li>
</ul>
<h1 id="changelog">Changelog</h1>
<h2 id="contributors">Contributors</h2>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git shortlog <span class="nt">-sn</span> apache-arrow-0.1.0..apache-arrow-0.2.0
73 Wes McKinney
55 Uwe L. Korn
16 Julien Le Dem
4 Bryan Cutler
4 Nong Li
2 Christopher C. Aycock
2 Jingyuan Wang
2 Kouhei Sutou
2 Laurent Goujon
2 Leif Walsh
1 Emilio Lahr-Vivaz
1 Holden Karau
1 Li Jin
1 Mohamed Zenadi
1 Peter Hoffmann
1 Steven Phillips
1 adeneche
1 ahnj
1 vkorukanti
</code></pre></div></div>
<h2 id="new-features-and-improvements">New Features and Improvements</h2>
<ul>
<li><a href="https://issues.apache.org/jira/browse/ARROW-108">ARROW-108</a> - [C++] Add IPC round trip for union types</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-189">ARROW-189</a> - C++: Use ExternalProject to build thirdparty dependencies</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-191">ARROW-191</a> - Python: Provide infrastructure for manylinux1 wheels</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-221">ARROW-221</a> - Add switch for writing Parquet 1.0 compatible logical types</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-227">ARROW-227</a> - [C++/Python] Hook arrow_io generic reader / writer interface into arrow_parquet</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-228">ARROW-228</a> - [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-243">ARROW-243</a> - [C++] Add “driver” option to HdfsClient to choose between libhdfs and libhdfs3 at runtime</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-268">ARROW-268</a> - [C++] Flesh out union implementation to have all required methods for IPC</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-303">ARROW-303</a> - [C++] Also build static libraries for leaf libraries</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-312">ARROW-312</a> - [Python] Provide Python API to read/write the Arrow IPC file format</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-317">ARROW-317</a> - [C++] Implement zero-copy Slice method on arrow::Buffer that retains reference to parent</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-327">ARROW-327</a> - [Python] Remove conda builds from Travis CI processes</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-328">ARROW-328</a> - [C++] Return shared_ptr by value instead of const-ref?</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-33">ARROW-33</a> - C++: Implement zero-copy array slicing</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-330">ARROW-330</a> - [C++] CMake functions to simplify shared / static library configuration</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-332">ARROW-332</a> - [Python] Add helper function to convert RecordBatch to pandas.DataFrame</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-333">ARROW-333</a> - Make writers update their internal schema even when no data is written.</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-335">ARROW-335</a> - Improve Type apis and toString() by encapsulating flatbuffers better</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-336">ARROW-336</a> - Run Apache Rat in Travis builds</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-338">ARROW-338</a> - [C++] Refactor IPC vector “loading” and “unloading” to be based on cleaner visitor pattern</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-350">ARROW-350</a> - Add Kerberos support to HDFS shim</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-353">ARROW-353</a> - Arrow release 0.2</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-355">ARROW-355</a> - Add tests for serialising arrays of empty strings to Parquet</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-356">ARROW-356</a> - Add documentation about reading Parquet</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-359">ARROW-359</a> - Need to document ARROW_LIBHDFS_DIR</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-360">ARROW-360</a> - C++: Add method to shrink PoolBuffer using realloc</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-361">ARROW-361</a> - Python: Support reading a column-selection from Parquet files</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-363">ARROW-363</a> - Set up Java/C++ integration test harness</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-365">ARROW-365</a> - Python: Provide Array.to_pandas()</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-366">ARROW-366</a> - [java] implement Dictionary vector</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-367">ARROW-367</a> - [java] converter csv/json &lt;=&gt; Arrow file format for Integration tests</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-368">ARROW-368</a> - Document use of LD_LIBRARY_PATH when using Python</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-369">ARROW-369</a> - [Python] Add ability to convert multiple record batches at once to pandas</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-372">ARROW-372</a> - Create JSON arrow file format for integration tests</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-373">ARROW-373</a> - [C++] Implement C++ version of JSON file format for testing</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-374">ARROW-374</a> - Python: clarify unicode vs. binary in API</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-377">ARROW-377</a> - Python: Add support for conversion of Pandas.Categorical</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-379">ARROW-379</a> - Python: Use setuptools_scm/setuptools_scm_git_archive to provide the version number</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-380">ARROW-380</a> - [Java] optimize null count when serializing vectors.</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-381">ARROW-381</a> - [C++] Simplify primitive array type builders to use a default type singleton</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-382">ARROW-382</a> - Python: Extend API documentation</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-383">ARROW-383</a> - [C++] Implement C++ version of ARROW-367 integration test validator</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-389">ARROW-389</a> - Python: Write Parquet files to pyarrow.io.NativeFile objects</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-394">ARROW-394</a> - Add integration tests for boolean, list, struct, and other basic types</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-396">ARROW-396</a> - Python: Add pyarrow.schema.Schema.equals</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-409">ARROW-409</a> - Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-410">ARROW-410</a> - [C++] Add Flush method to arrow::io::OutputStream</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-411">ARROW-411</a> - [Java] Move Intergration.compare and Intergration.compareSchemas to a public utils class</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-415">ARROW-415</a> - C++: Add Equals implementation to compare Tables</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-416">ARROW-416</a> - C++: Add Equals implementation to compare Columns</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-417">ARROW-417</a> - C++: Add Equals implementation to compare ChunkedArrays</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-418">ARROW-418</a> - [C++] Consolidate array container and builder code, remove arrow/types</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-419">ARROW-419</a> - [C++] Promote util/{status.h, buffer.h, memory-pool.h} to top level of arrow/ source directory</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-423">ARROW-423</a> - C++: Define BUILD_BYPRODUCTS in external project to support non-make CMake generators</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-425">ARROW-425</a> - Python: Expose a C function to convert arrow::Table to pyarrow.Table</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-426">ARROW-426</a> - Python: Conversion from pyarrow.Array to a Python list</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-427">ARROW-427</a> - [C++] Implement dictionary-encoded array container</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-428">ARROW-428</a> - [Python] Deserialize from Arrow record batches to pandas in parallel using a thread pool</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-430">ARROW-430</a> - Python: Better version handling</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-432">ARROW-432</a> - [Python] Avoid unnecessary memory copy in to_pandas conversion by using low-level pandas internals APIs</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-438">ARROW-438</a> - [Python] Concatenate Table instances with equal schemas</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-440">ARROW-440</a> - [C++] Support pkg-config</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-441">ARROW-441</a> - [Python] Expose Arrow’s file and memory map classes as NativeFile subclasses</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-442">ARROW-442</a> - [Python] Add public Python API to inspect Parquet file metadata</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-444">ARROW-444</a> - [Python] Avoid unnecessary memory copies from use of PyBytes_* C APIs</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-449">ARROW-449</a> - Python: Conversion from pyarrow.{Table,RecordBatch} to a Python dict</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-450">ARROW-450</a> - Python: Fixes for PARQUET-818</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-456">ARROW-456</a> - C++: Add jemalloc based MemoryPool</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-457">ARROW-457</a> - Python: Better control over memory pool</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-458">ARROW-458</a> - Python: Expose jemalloc MemoryPool</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-461">ARROW-461</a> - [Python] Implement conversion between arrow::DictionaryArray and pandas.Categorical</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-463">ARROW-463</a> - C++: Support jemalloc 4.x</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-466">ARROW-466</a> - C++: ExternalProject for jemalloc</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-467">ARROW-467</a> - [Python] Run parquet-cpp unit tests in Travis CI</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-468">ARROW-468</a> - Python: Conversion of nested data in pd.DataFrames to/from Arrow structures</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-470">ARROW-470</a> - [Python] Add “FileSystem” abstraction to access directories of files in a uniform way</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-471">ARROW-471</a> - [Python] Enable ParquetFile to pass down separately-obtained file metadata</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-472">ARROW-472</a> - [Python] Expose parquet::{SchemaDescriptor, ColumnDescriptor}::Equals</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-474">ARROW-474</a> - Create an Arrow streaming file fomat</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-475">ARROW-475</a> - [Python] High level support for reading directories of Parquet files (as a single Arrow table) from supported file system interfaces</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-476">ARROW-476</a> - [Integration] Add integration tests for Binary / Varbytes type</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-477">ARROW-477</a> - [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-478">ARROW-478</a> - [Python] Accept a PyBytes object in the pyarrow.io.BufferReader ctor</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-479">ARROW-479</a> - Python: Test for expected schema in Pandas conversion</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-484">ARROW-484</a> - Add more detail about what of technology can be found in the Arrow implementations to README</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-485">ARROW-485</a> - [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-490">ARROW-490</a> - Python: Update manylinux1 build scripts</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-495">ARROW-495</a> - [C++] Add C++ implementation of streaming serialized format</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-497">ARROW-497</a> - [Java] Integration test harness for streaming format</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-498">ARROW-498</a> - [C++] Integration test harness for streaming format</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-503">ARROW-503</a> - [Python] Interface to streaming binary format</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-506">ARROW-506</a> - Implement Arrow Echo server for integration testing</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-508">ARROW-508</a> - [C++] Make file/memory-mapped file interfaces threadsafe</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-509">ARROW-509</a> - [Python] Add support for PARQUET-835 (parallel column reads)</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-512">ARROW-512</a> - C++: Add method to check for primitive types</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-514">ARROW-514</a> - [Python] Accept pyarrow.io.Buffer as input to StreamReader, FileReader classes</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-515">ARROW-515</a> - [Python] Add StreamReader/FileReader methods that read all record batches as a Table</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-521">ARROW-521</a> - [C++/Python] Track peak memory use in default MemoryPool</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-524">ARROW-524</a> - [java] provide apis to access nested vectors and buffers</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-525">ARROW-525</a> - Python: Add more documentation to the package</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-527">ARROW-527</a> - clean drill-module.conf file</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-529">ARROW-529</a> - Python: Add jemalloc and Python 3.6 to manylinux1 build</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-531">ARROW-531</a> - Python: Document jemalloc, extend Pandas section, add Getting Involved</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-538">ARROW-538</a> - [C++] Set up AddressSanitizer (ASAN) builds</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-546">ARROW-546</a> - Python: Account for changes in PARQUET-867</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-547">ARROW-547</a> - [Python] Expose Array::Slice and RecordBatch::Slice</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-553">ARROW-553</a> - C++: Faster valid bitmap building</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-558">ARROW-558</a> - Add KEYS files</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-81">ARROW-81</a> - [Format] Add a Category logical type (distinct from dictionary-encoding)</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-96">ARROW-96</a> - C++: API documentation using Doxygen</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-97">ARROW-97</a> - Python: API documentation via sphinx-apidoc</li>
</ul>
<h2 id="bug-fixes">Bug Fixes</h2>
<ul>
<li><a href="https://issues.apache.org/jira/browse/ARROW-112">ARROW-112</a> - [C++] Style fix for constants/enums</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-202">ARROW-202</a> - [C++] Integrate with appveyor ci for windows support and get arrow building on windows</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-220">ARROW-220</a> - [C++] Build conda artifacts in a build environment with better cross-linux ABI compatibility</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-224">ARROW-224</a> - [C++] Address static linking of boost dependencies</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-230">ARROW-230</a> - Python: Do not name modules like native ones (i.e. rename pyarrow.io)</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-239">ARROW-239</a> - [Python] HdfsFile.read called with no arguments should read remainder of file</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-261">ARROW-261</a> - [C++] Refactor BinaryArray/StringArray classes to not inherit from ListArray</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-275">ARROW-275</a> - Add tests for UnionVector in Arrow File</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-294">ARROW-294</a> - [C++] Do not use fopen / fclose / etc. methods for memory mapped file implementation</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-322">ARROW-322</a> - [C++] Do not build HDFS IO interface optionally</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-323">ARROW-323</a> - [Python] Opt-in to PyArrow parquet build rather than skipping silently on failure</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-334">ARROW-334</a> - [Python] OS X rpath issues on some configurations</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-337">ARROW-337</a> - UnionListWriter.list() is doing more than it should, this can cause data corruption</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-339">ARROW-339</a> - Make merge_arrow_pr script work with Python 3</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-340">ARROW-340</a> - [C++] Opening a writeable file on disk that already exists does not truncate to zero</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-342">ARROW-342</a> - Set Python version on release</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-345">ARROW-345</a> - libhdfs integration doesn’t work for Mac</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-346">ARROW-346</a> - Python API Documentation</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-348">ARROW-348</a> - [Python] CMake build type should be configurable on the command line</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-349">ARROW-349</a> - Six is missing as a requirement in the python setup.py</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-351">ARROW-351</a> - Time type has no unit</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-354">ARROW-354</a> - Connot compare an array of empty strings to another</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-357">ARROW-357</a> - Default Parquet chunk_size of 64k is too small</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-358">ARROW-358</a> - [C++] libhdfs can be in non-standard locations in some Hadoop distributions</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-362">ARROW-362</a> - Python: Calling to_pandas on a table read from Parquet leaks memory</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-371">ARROW-371</a> - Python: Table with null timestamp becomes float in pandas</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-375">ARROW-375</a> - columns parameter in parquet.read_table() raises KeyError for valid column</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-384">ARROW-384</a> - Align Java and C++ RecordBatch data and metadata layout</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-386">ARROW-386</a> - [Java] Respect case of struct / map field names</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-387">ARROW-387</a> - [C++] arrow::io::BufferReader does not permit shared memory ownership in zero-copy reads</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-390">ARROW-390</a> - C++: CMake fails on json-integration-test with ARROW_BUILD_TESTS=OFF</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-392">ARROW-392</a> - Fix string/binary integration tests</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-393">ARROW-393</a> - [JAVA] JSON file reader fails to set the buffer size on String data vector</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-395">ARROW-395</a> - Arrow file format writes record batches in reverse order.</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-398">ARROW-398</a> - [Java] Java file format requires bitmaps of all 1’s to be written when there are no nulls</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-399">ARROW-399</a> - [Java] ListVector.loadFieldBuffers ignores the ArrowFieldNode length metadata</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-400">ARROW-400</a> - [Java] ArrowWriter writes length 0 for Struct types</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-401">ARROW-401</a> - [Java] Floating point vectors should do an approximate comparison in integration tests</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-402">ARROW-402</a> - [Java] “refCnt gone negative” error in integration tests</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-403">ARROW-403</a> - [JAVA] UnionVector: Creating a transfer pair doesn’t transfer the schema to destination vector</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-404">ARROW-404</a> - [Python] Closing an HdfsClient while there are still open file handles results in a crash</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-405">ARROW-405</a> - [C++] Be less stringent about finding include/hdfs.h in HADOOP_HOME</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-406">ARROW-406</a> - [C++] Large HDFS reads must utilize the set file buffer size when making RPCs</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-408">ARROW-408</a> - [C++/Python] Remove defunct conda recipes</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-414">ARROW-414</a> - [Java] “Buffer too large to resize to …” error</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-420">ARROW-420</a> - Align Date implementation between Java and C++</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-421">ARROW-421</a> - [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a reference to the parent PyBytes to avoid premature garbage collection issues</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-422">ARROW-422</a> - C++: IPC should depend on rapidjson_ep if RapidJSON is vendored</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-429">ARROW-429</a> - git-archive SHA-256 checksums are changing</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-433">ARROW-433</a> - [Python] Date conversion is locale-dependent</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-434">ARROW-434</a> - Segfaults and encoding issues in Python Parquet reads</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-435">ARROW-435</a> - C++: Spelling mistake in if(RAPIDJSON_VENDORED)</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-437">ARROW-437</a> - [C++] clang compiler warnings from overridden virtual functions</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-445">ARROW-445</a> - C++: arrow_ipc is built before arrow/ipc/Message_generated.h was generated</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-447">ARROW-447</a> - Python: Align scalar/pylist string encoding with pandas’ one.</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-455">ARROW-455</a> - [C++] BufferOutputStream dtor does not call Close()</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-469">ARROW-469</a> - C++: Add option so that resize doesn’t decrease the capacity</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-481">ARROW-481</a> - [Python] Fix Python 2.7 regression in patch for PARQUET-472</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-486">ARROW-486</a> - [C++] arrow::io::MemoryMappedFile can’t be casted to arrow::io::FileInterface</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-487">ARROW-487</a> - Python: ConvertTableToPandas segfaults if ObjectBlock::Write fails</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-494">ARROW-494</a> - [C++] When MemoryMappedFile is destructed, memory is unmapped even if buffer referecnes still exist</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-499">ARROW-499</a> - Update file serialization to use streaming serialization format</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-505">ARROW-505</a> - [C++] Fix compiler warnings in release mode</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-511">ARROW-511</a> - [Python] List[T] conversions not implemented for single arrays</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-513">ARROW-513</a> - [C++] Fix Appveyor build</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-519">ARROW-519</a> - [C++] Missing vtable in libarrow.dylib on Xcode 6.4</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-523">ARROW-523</a> - Python: Account for changes in PARQUET-834</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-533">ARROW-533</a> - [C++] arrow::TimestampArray / TimeArray has a broken constructor</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-535">ARROW-535</a> - [Python] Add type mapping for NPY_LONGLONG</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-537">ARROW-537</a> - [C++] StringArray/BinaryArray comparisons may be incorrect when values with non-zero length are null</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-540">ARROW-540</a> - [C++] Fix build in aftermath of ARROW-33</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-543">ARROW-543</a> - C++: Lazily computed null_counts counts number of non-null entries</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-544">ARROW-544</a> - [C++] ArrayLoader::LoadBinary fails for length-0 arrays</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-545">ARROW-545</a> - [Python] Ignore files without .parq or .parquet prefix when reading directory of files</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-548">ARROW-548</a> - [Python] Add nthreads option to pyarrow.Filesystem.read_parquet</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-551">ARROW-551</a> - C++: Construction of Column with nullptr Array segfaults</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-556">ARROW-556</a> - [Integration] Can not run Integration tests if different cpp build path</li>
<li><a href="https://issues.apache.org/jira/browse/ARROW-561">ARROW-561</a> - Update java &amp; python dependencies to improve downstream packaging experience</li>
</ul>
</main>
<hr/>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p>
<p>&copy; 2016-2024 The Apache Software Foundation</p>
</div>
<div class="col-md-3">
<a class="d-sm-none d-md-inline pr-2" href="https://www.apache.org/events/current-event.html">
<img src="https://www.apache.org/events/current-event-234x60.png"/>
</a>
</div>
</div>
</footer>
</div>
</body>
</html>