| :mod:`airflow.providers.amazon.aws.transfers.mongo_to_s3` |
| ========================================================= |
| |
| .. py:module:: airflow.providers.amazon.aws.transfers.mongo_to_s3 |
| |
| |
| Module Contents |
| --------------- |
| |
| .. data:: _DEPRECATION_MSG |
| :annotation: = The s3_conn_id parameter has been deprecated. You should pass instead the aws_conn_id parameter. |
| |
| |
| |
| .. py:class:: MongoToS3Operator(*, s3_conn_id: Optional[str] = None, mongo_conn_id: str = 'mongo_default', aws_conn_id: str = 'aws_default', mongo_collection: str, mongo_query: Union[list, dict], s3_bucket: str, s3_key: str, mongo_db: Optional[str] = None, replace: bool = False, allow_disk_use: bool = False, compression: Optional[str] = None, **kwargs) |
| |
| Bases: :class:`airflow.models.BaseOperator` |
| |
| Operator meant to move data from mongo via pymongo to s3 via boto. |
| |
| :param mongo_conn_id: reference to a specific mongo connection |
| :type mongo_conn_id: str |
| :param aws_conn_id: reference to a specific S3 connection |
| :type aws_conn_id: str |
| :param mongo_collection: reference to a specific collection in your mongo db |
| :type mongo_collection: str |
| :param mongo_query: query to execute. A list including a dict of the query |
| :type mongo_query: list |
| :param s3_bucket: reference to a specific S3 bucket to store the data |
| :type s3_bucket: str |
| :param s3_key: in which S3 key the file will be stored |
| :type s3_key: str |
| :param mongo_db: reference to a specific mongo database |
| :type mongo_db: str |
| :param replace: whether or not to replace the file in S3 if it previously existed |
| :type replace: bool |
| :param allow_disk_use: in the case you are retrieving a lot of data, you may have |
| to use the disk to save it instead of saving all in the RAM |
| :type allow_disk_use: bool |
| :param compression: type of compression to use for output file in S3. Currently only gzip is supported. |
| :type compression: str |
| |
| .. attribute:: template_fields |
| :annotation: = ['s3_bucket', 's3_key', 'mongo_query', 'mongo_collection'] |
| |
| |
| |
| .. attribute:: ui_color |
| :annotation: = #589636 |
| |
| |
| |
| .. attribute:: template_fields_renderers |
| |
| |
| |
| |
| |
| .. method:: execute(self, context) |
| |
| Is written to depend on transform method |
| |
| |
| |
| |
| .. staticmethod:: _stringify(iterable: Iterable, joinable: str = '\n') |
| |
| Takes an iterable (pymongo Cursor or Array) containing dictionaries and |
| returns a stringified version using python join |
| |
| |
| |
| |
| .. staticmethod:: transform(docs: Any) |
| |
| This method is meant to be extended by child classes |
| to perform transformations unique to those operators needs. |
| Processes pyMongo cursor and returns an iterable with each element being |
| a JSON serializable dictionary |
| |
| Base transform() assumes no processing is needed |
| ie. docs is a pyMongo cursor of documents and cursor just |
| needs to be passed through |
| |
| Override this method for custom transformations |
| |
| |
| |
| |