blob: 543569b110f9abcc729739fdb1a2c6a0c1aaed39 [file] [view]
Prompt:
Is Apache Parquet format supported in Apache Beam?
Response:
[Apache Parquet](https://parquet.apache.org/) is a data format for storing and exchanging data. Apache Beam supports reading from and writing to Parquet files using the `ReadFromParquet` and `WriteToParquet` transforms in the `ParquetIO` module. For more information, see the ParquetIO connector documentation for the following languages:
* [Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.html)
* [Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.parquetio.html)
* [Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/parquetio)
* [Typescript (using cross-language transforms)](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/parquetio.ts)
To get started with Avro and Apache Beam, refer to the [Dataflow Cookbook GitHub repository](https://github.com/GoogleCloudPlatform/dataflow-cookbook).
Here is an example of Apache Beam pipeline code for reading data from a Parquet file:
```python
class ReadParquetOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_argument(
"--path",
help="GCS path to parquet file")
options = ReadParquetOptions()
with beam.Pipeline(options=options) as p:
(p | "ReadParquet" >> ReadFromParquet(options.path)
| Map(logging.info))
```