blob: 186dad171af4f6741c8b0ba718f940a5f2cbe0cd [file] [log] [blame]
:mod:`airflow.providers.amazon.aws.transfers.mongo_to_s3`
=========================================================
.. py:module:: airflow.providers.amazon.aws.transfers.mongo_to_s3
Module Contents
---------------
.. data:: _DEPRECATION_MSG
:annotation: = The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter.
.. py:class:: MongoToS3Operator(*, s3_conn_id: Optional[str] = None, mongo_conn_id: str = 'mongo_default', aws_conn_id: str = 'aws_default', mongo_collection: str, mongo_query: Union[list, dict], s3_bucket: str, s3_key: str, mongo_db: Optional[str] = None, replace: bool = False, allow_disk_use: bool = False, compression: Optional[str] = None, **kwargs)
Bases: :class:`airflow.models.BaseOperator`
Operator meant to move data from mongo via pymongo to s3 via boto.
:param mongo_conn_id: reference to a specific mongo connection
:type mongo_conn_id: str
:param aws_conn_id: reference to a specific S3 connection
:type aws_conn_id: str
:param mongo_collection: reference to a specific collection in your mongo db
:type mongo_collection: str
:param mongo_query: query to execute. A list including a dict of the query
:type mongo_query: list
:param s3_bucket: reference to a specific S3 bucket to store the data
:type s3_bucket: str
:param s3_key: in which S3 key the file will be stored
:type s3_key: str
:param mongo_db: reference to a specific mongo database
:type mongo_db: str
:param replace: whether or not to replace the file in S3 if it previously existed
:type replace: bool
:param allow_disk_use: in the case you are retrieving a lot of data, you may have
to use the disk to save it instead of saving all in the RAM
:type allow_disk_use: bool
:param compression: type of compression to use for output file in S3. Currently only gzip is supported.
:type compression: str
.. attribute:: template_fields
:annotation: = ['s3_bucket', 's3_key', 'mongo_query', 'mongo_collection']
.. attribute:: ui_color
:annotation: = #589636
.. attribute:: template_fields_renderers
.. method:: execute(self, context)
Is written to depend on transform method
.. staticmethod:: _stringify(iterable: Iterable, joinable: str = '\n')
Takes an iterable (pymongo Cursor or Array) containing dictionaries and
returns a stringified version using python join
.. staticmethod:: transform(docs: Any)
This method is meant to be extended by child classes
to perform transformations unique to those operators needs.
Processes pyMongo cursor and returns an iterable with each element being
a JSON serializable dictionary
Base transform() assumes no processing is needed
ie. docs is a pyMongo cursor of documents and cursor just
needs to be passed through
Override this method for custom transformations