Parquet

Version 2.7.0

Sub-task

Bug

  • PARQUET-1437 - Misleading comment in parquet.thrift
  • PARQUET-1554 - Compilation error when upgrading Scrooge version
  • PARQUET-1561 - Inconsistencies in the Parquet Delta Encoding specification

New Feature

Improvement

Task

  • PARQUET-1433 - Parquet-format doesn't compile with Thrift 0.10.0
  • PARQUET-1572 - Clarify the definition of timestamp types
  • PARQUET-1585 - Update old external links in the code base
  • PARQUET-1627 - Update specification so that legacy timestamp logical types can be written for local semantics as well

Version 2.6.0

Bug

  • PARQUET-1266 - LogicalTypes union in parquet-format doesn't include UUID

Improvement

  • PARQUET-1290 - Clarify maximum run lengths for RLE encoding
  • PARQUET-1387 - Nanosecond precision time and timestamp - parquet-format
  • PARQUET-1400 - Deprecate parquet-mr related code in parquet-format

Task

Version 2.5.0

Bug

  • PARQUET-323 - INT96 should be marked as deprecated
  • PARQUET-1064 - Deprecate type-defined sort ordering for INTERVAL type
  • PARQUET-1065 - Deprecate type-defined sort ordering for INT96 type
  • PARQUET-1145 - Add license to .gitignore and .travis.yml
  • PARQUET-1156 - dev/merge_parquet_pr.py problems
  • PARQUET-1236 - Upgrade org.slf4j:slf4j-api:1.7.2 to 1.7.12
  • PARQUET-1242 - parquet.thrift refers to wrong releases for the new compressions
  • PARQUET-1251 - Clarify ambiguous min/max stats for FLOAT/DOUBLE
  • PARQUET-1258 - Update scm developer connection to github

New Feature

Improvement

Task

Version 2.4.0

Bug

Improvement

  • PARQUET-371 - Bumps Thrift version to 0.9.3
  • PARQUET-407 - Incorrect delta-encoding example
  • PARQUET-428 - Support INT96 and FIXED_LEN_BYTE_ARRAY types
  • PARQUET-601 - Add support in Parquet to configure the encoding used by ValueWriters
  • PARQUET-609 - Add Brotli compression to Parquet format
  • PARQUET-757 - Add NULL type to Bring Parquet logical types to par with Arrow
  • PARQUET-804 - parquet-format README.md still links to the old Google group
  • PARQUET-922 - Add index pages to the format to support efficient page skipping
  • PARQUET-1049 - Make thrift version a property in pom.xml

Task

Version 2.2.0

Version 2.1.0

  • ISSUE 84: Add metadata in the schema for storing decimals.
  • ISSUE 89: Added statistics to the data page header
  • ISSUE 86: Fix minor formatting, correct some wording under the “Error recovery” se...
  • ISSUE 82: exclude thrift source from jar
  • ISSUE 80: Upgrade maven-shade-plugin to 2.1 to compile with mvn 3.1.1

Version 2.0.0

  • ISSUE 79: Reorganize encodings and add details
  • ISSUE 78: Added sorted flag to dictionary page headers.
  • ISSUE 77: fix plugin versions
  • ISSUE 75: refactor dictionary encoding
  • ISSUE 64: new data page and stats
  • ISSUE 74: deprecate and remove group_var_int encoding
  • ISSUE 76: add mention of boolean on RLE
  • ISSUE 73: reformat encodings
  • ISSUE 71: refactor documentation for 2.0 encodings
  • ISSUE 66: Block strings
  • ISSUE 67: Add ENUM ConvertedType
  • ISSUE 58: Correct unterminated comment for SortingColumn.
  • ISSUE 51: Add metadata to specify row groups are sorted.

Version 1.0.0

  • ISSUE 46: Update readme to include 4 byte length in rle columns
  • ISSUE 47: fixed typo in readme.md
  • ISSUE 45: Typo in describing preferred row group size
  • ISSUE 43: add dictionary encoding details
  • ISSUE 41: Update readme with details about RLE encoding
  • ISSUE 39: Added created_by optional file metadata.
  • ISSUE 40: add details about the page size fields
  • ISSUE 35: this embeds and renames the thrift dependency in the jar, allowing people to use a different version of thrift in parallel
  • ISSUE 36: adding the encoding to the dictionary page
  • ISSUE 34: Corrected typo
  • ISSUE 32: Add layout diagram to README and fix typo
  • ISSUE 31: Restore encoding changes