chore(deps): bump flask from 2.0.1 to 2.2.5

Bumps [flask](https://github.com/pallets/flask) from 2.0.1 to 2.2.5.
- [Release notes](https://github.com/pallets/flask/releases)
- [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/flask/compare/2.0.1...2.2.5)

---
updated-dependencies:
- dependency-name: flask
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
1 file changed
tree: 497baa8bbcb1740cb582aa911be45b8d65036c1f
  1. ci.cd/
  2. docker/
  3. documentations/
  4. etc/
  5. k8s_spark/
  6. local.spark.cluster/
  7. one_offs/
  8. parquet_cli/
  9. parquet_flask/
  10. terraform/
  11. tests/
  12. .gitignore
  13. .releaseignore
  14. CHANGELOG.md
  15. CONTRIBUTING.md
  16. Deployment-in-AWS.md
  17. DISCLAIMER-WIP
  18. in_situ_record_schema.json
  19. in_situ_schema.json
  20. LICENSE
  21. NOTICE
  22. README
  23. README.md
  24. rotate_keys.bash
  25. s3a.parquet.performance.issue.md
  26. setup.py
  27. setup_lambda.py
README.md

Insitu Data in Parquet format stored in S3

How to ingest a insitu json file to Parquet

  • Assumption: K8s is successfully deployed

  • Download this repo

  • (optional) create different python3.6 environment

  • install dependencies

      python3 setup.py install
    
  • setup AWS tokens

      export AWS_ACCESS_KEY_ID=xxx
      export AWS_SECRET_ACCESS_KEY=xxx
      export AWS_SESSION_TOKEN=really.long.token
      export AWS_REGION=us-west-2
    
    • alternatively the default profile under ~/.aws/credentials can be setup as well
  • setup current directory to PYTHONPATH

      PYTHONPATH="${PYTHONPATH}:/absolute/path/to/current/dir/"
    
  • run the script:

      python3 -m parquet_cli.ingest_s3 --help
    
    • sample script:

        python3 -m parquet_cli.ingest_s3 \
          --LOG_LEVEL 30 \
          --CDMS_DOMAIN https://doms.jpl.nasa.gov/insitu  \
          --CDMS_BEARER_TOKEN Mock-CDMS-Flask-Token  \
          --PARQUET_META_TBL_NAME cdms_parquet_meta_dev_v1  \
          --BUCKET_NAME cdms-dev-ncar-in-situ-stage  \
          --KEY_PREFIX cdms_icoads_2017-01-01.json
      

Ref:

  • how to replace parquet file partially
https://stackoverflow.com/questions/38487667/overwrite-specific-partitions-in-spark-dataframe-write-method?noredirect=1&lq=1
> Finally! This is now a feature in Spark 2.3.0: SPARK-20236
> To use it, you need to set the spark.sql.sources.partitionOverwriteMode setting to dynamic, the dataset needs to be partitioned, and the write mode overwrite. Example:
> https://stackoverflow.com/questions/50006526/overwrite-only-some-partitions-in-a-partitioned-spark-dataset

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
data.toDF().write.mode("overwrite").format("parquet").partitionBy("date", "name").save("s3://path/to/somewhere")