tree 2b5d7c76ae067ee5d799dd312e314f4ab512c98b
parent 456801caedcb7f448b21dd1ac3f2c8120e4230cd
author Joshua Storck <joshua.storck@twosigma.com> 1524038954 +0200
committer Uwe L. Korn <uwe@apache.org> 1524038954 +0200

PARQUET-1273: Properly write dictionary values when writing in chunks

The error was reported here: https://issues.apache.org/jira/browse/ARROW-1938.

Because dictionary types are not supported in writing yet, the code converts the dictionary column to the actual values first before writing. However, the existing code was accidentally using zero as the offset and the length of the column as the size. This resulted in writing all of the column values for each chunk of the column that was supposed to be written.

The fix is to pass the offset and size when recursively calling through to WriteColumnChunk with the "flattened" data.

Author: Joshua Storck <joshua.storck@twosigma.com>

Closes #453 from joshuastorck/ARROW_1938 and squashes the following commits:

c2af50f [Joshua Storck] Remove extraneous semicolon in unit test
23f5722 [Joshua Storck] Ran clang-format on arrow-reader-writer-test.cc
314b159 [Joshua Storck] Removing print statements from AssertTableEqual
f0bc71a [Joshua Storck] Fixing bug reported in https://issues.apache.org/jira/browse/ARROW-1938, namely preventing all of the values in a dictionary column from being written to parquet for each chunk created as a result of specifying row_group_size
