blob: 6abe29e94abb8dcf64ae4e40a497193b27c2b0df [file] [view]
Prompt:
Write the Python code to read data from JSON files using Apache Beam.
Response:
Your Apache Beam pipeline can read data from [JSON files](https://www.json.org/) using the Apache Beam [TextIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html).
```python
import logging
import apache_beam as beam
from apache_beam import Map
from apache_beam.io.textio import ReadFromJson
from apache_beam.options.pipeline_options import PipelineOptions
class JsonOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_argument(
'--file_path',
default="gs://your-bucket/your-file.json",
help='Json file path'
)
def run():
"""
This pipeline reads from Json file defined by the --file_path argument.
"""
options = JsonOptions()
with beam.Pipeline(options=options) as p:
output = p | "Read from Json file" >> ReadFromJson(
path=options.file_path,
lines=False
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
run()
```
The `JsonOptions` class is a custom class that extends the `PipelineOptions` object. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/) to parse command-line arguments.
The [`ReadFromJson`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromJson) transform is a built-in Apache Beam transform that reads data from a JSON file.