| bad_parquet_data.parquet: |
| Generated with parquet-mr 1.2.5 |
| Contains 3 single-column rows: |
| "parquet" |
| "is" |
| "fun" |
| |
| bad_rle_literal_count.parquet: |
| Generated by hacking Impala's Parquet writer. |
| Contains a single bigint column 'c' with the values 1, 3, 7 stored |
| in a single data chunk as dictionary plain. The RLE encoded dictionary |
| indexes are all literals (and not repeated), but the literal count |
| is incorrectly 0 in the file to test that such data corruption is |
| proprly handled. |
| |
| bad_rle_repeat_count.parquet: |
| Generated by hacking Impala's Parquet writer. |
| Contains a single bigint column 'c' with the value 7 repeated 7 times |
| stored in a single data chunk as dictionary plain. The RLE encoded dictionary |
| indexes are a single repeated run (and not literals), but the repeat count |
| is incorrectly 0 in the file to test that such data corruption is proprly |
| handled. |
| |
| zero_rows_zero_row_groups.parquet: |
| Generated by hacking Impala's Parquet writer. |
| The file metadata indicates zero rows and no row groups. |
| |
| zero_rows_one_row_group.parquet: |
| Generated by hacking Impala's Parquet writer. |
| The file metadata indicates zero rows but one row group. |
| |
| repeated_values.parquet: |
| Generated with parquet-mr 1.2.5 |
| Contains 3 single-column rows: |
| "parquet" |
| "parquet" |
| "parquet" |
| |
| multiple_rowgroups.parquet: |
| Generated with parquet-mr 1.2.5 |
| Populated with: |
| hive> set parquet.block.size=500; |
| hive> INSERT INTO TABLE tbl |
| SELECT l_comment FROM tpch.lineitem LIMIT 1000; |
| |
| alltypesagg_hive_13_1.parquet: |
| Generated with parquet-mr version 1.5.0-cdh5.4.0-SNAPSHOT |
| hive> create table alltypesagg_hive_13_1 stored as parquet as select * from alltypesagg; |
| |
| bad_column_metadata.parquet: |
| Generated with hacked version of parquet-mr 1.8.2-SNAPSHOT |
| Schema: |
| {"type": "record", |
| "namespace": "org.apache.impala", |
| "name": "bad_column_metadata", |
| "fields": [ |
| {"name": "id", "type": ["null", "long"]}, |
| {"name": "int_array", "type": ["null", {"type": "array", "items": ["null", "int"]}]} |
| ] |
| } |
| Contains 3 row groups, each with ten rows and each array containing ten elements. The |
| first rowgroup column metadata for 'int_array' incorrectly states there are 50 values |
| (instead of 100), and the second rowgroup column metadata for 'id' incorrectly states |
| there are 11 values (instead of 10). The third rowgroup has the correct metadata. |
| |
| data-bzip2.bz2 |
| Generated with bzip2, contains single bzip2 stream |
| Contains 1 column, uncompressed data size < 8M |
| |
| large_bzip2.bz2 |
| Generated with bzip2, contains single bzip2 stream |
| Contains 1 column, uncompressed data size > 8M |
| |
| data-pbzip2.bz2 |
| Generated with pbzip2, contains multiple bzip2 streams |
| Contains 1 column, uncompressed data size < 8M |
| |
| large_pbzip2.bz2 |
| Generated with pbzip2, contains multiple bzip2 stream |
| Contains 1 column, uncompressed data size > 8M |
| |
| out_of_range_timestamp.parquet: |
| ----------- |
| Generated with a hacked version of Impala parquet writer. |
| Contains a single timestamp column with 4 values, 2 of which are out of range |
| and should be read as NULL by Impala: |
| 1399-12-31 00:00:00 (invalid - date too small) |
| 1400-01-01 00:00:00 |
| 9999-12-31 00:00:00 |
| 10000-01-01 00:00:00 (invalid - date too large) |