| :mod:`airflow.hooks.S3_hook` |
| ============================ |
| |
| .. py:module:: airflow.hooks.S3_hook |
| |
| .. autoapi-nested-parse:: |
| |
| Interact with AWS S3, using the boto3 library. |
| |
| |
| |
| Module Contents |
| --------------- |
| |
| .. py:class:: S3Hook |
| |
| Bases: :class:`airflow.contrib.hooks.aws_hook.AwsHook` |
| |
| Interact with AWS S3, using the boto3 library. |
| |
| |
| .. method:: get_conn(self) |
| |
| |
| |
| |
| .. staticmethod:: parse_s3_url(s3url) |
| |
| |
| |
| |
| .. method:: check_for_bucket(self, bucket_name) |
| |
| Check if bucket_name exists. |
| |
| :param bucket_name: the name of the bucket |
| :type bucket_name: str |
| |
| |
| |
| |
| .. method:: get_bucket(self, bucket_name) |
| |
| Returns a boto3.S3.Bucket object |
| |
| :param bucket_name: the name of the bucket |
| :type bucket_name: str |
| |
| |
| |
| |
| .. method:: create_bucket(self, bucket_name, region_name=None) |
| |
| Creates an Amazon S3 bucket. |
| |
| :param bucket_name: The name of the bucket |
| :type bucket_name: str |
| :param region_name: The name of the aws region in which to create the bucket. |
| :type region_name: str |
| |
| |
| |
| |
| .. method:: check_for_prefix(self, bucket_name, prefix, delimiter) |
| |
| Checks that a prefix exists in a bucket |
| |
| :param bucket_name: the name of the bucket |
| :type bucket_name: str |
| :param prefix: a key prefix |
| :type prefix: str |
| :param delimiter: the delimiter marks key hierarchy. |
| :type delimiter: str |
| |
| |
| |
| |
| .. method:: list_prefixes(self, bucket_name, prefix='', delimiter='', page_size=None, max_items=None) |
| |
| Lists prefixes in a bucket under prefix |
| |
| :param bucket_name: the name of the bucket |
| :type bucket_name: str |
| :param prefix: a key prefix |
| :type prefix: str |
| :param delimiter: the delimiter marks key hierarchy. |
| :type delimiter: str |
| :param page_size: pagination size |
| :type page_size: int |
| :param max_items: maximum items to return |
| :type max_items: int |
| |
| |
| |
| |
| .. method:: list_keys(self, bucket_name, prefix='', delimiter='', page_size=None, max_items=None) |
| |
| Lists keys in a bucket under prefix and not containing delimiter |
| |
| :param bucket_name: the name of the bucket |
| :type bucket_name: str |
| :param prefix: a key prefix |
| :type prefix: str |
| :param delimiter: the delimiter marks key hierarchy. |
| :type delimiter: str |
| :param page_size: pagination size |
| :type page_size: int |
| :param max_items: maximum items to return |
| :type max_items: int |
| |
| |
| |
| |
| .. method:: check_for_key(self, key, bucket_name=None) |
| |
| Checks if a key exists in a bucket |
| |
| :param key: S3 key that will point to the file |
| :type key: str |
| :param bucket_name: Name of the bucket in which the file is stored |
| :type bucket_name: str |
| |
| |
| |
| |
| .. method:: get_key(self, key, bucket_name=None) |
| |
| Returns a boto3.s3.Object |
| |
| :param key: the path to the key |
| :type key: str |
| :param bucket_name: the name of the bucket |
| :type bucket_name: str |
| |
| |
| |
| |
| .. method:: read_key(self, key, bucket_name=None) |
| |
| Reads a key from S3 |
| |
| :param key: S3 key that will point to the file |
| :type key: str |
| :param bucket_name: Name of the bucket in which the file is stored |
| :type bucket_name: str |
| |
| |
| |
| |
| .. method:: select_key(self, key, bucket_name=None, expression='SELECT * FROM S3Object', expression_type='SQL', input_serialization=None, output_serialization=None) |
| |
| Reads a key with S3 Select. |
| |
| :param key: S3 key that will point to the file |
| :type key: str |
| :param bucket_name: Name of the bucket in which the file is stored |
| :type bucket_name: str |
| :param expression: S3 Select expression |
| :type expression: str |
| :param expression_type: S3 Select expression type |
| :type expression_type: str |
| :param input_serialization: S3 Select input data serialization format |
| :type input_serialization: dict |
| :param output_serialization: S3 Select output data serialization format |
| :type output_serialization: dict |
| :return: retrieved subset of original data by S3 Select |
| :rtype: str |
| |
| .. seealso:: |
| For more details about S3 Select parameters: |
| http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.select_object_content |
| |
| |
| |
| |
| .. method:: check_for_wildcard_key(self, wildcard_key, bucket_name=None, delimiter='') |
| |
| Checks that a key matching a wildcard expression exists in a bucket |
| |
| :param wildcard_key: the path to the key |
| :type wildcard_key: str |
| :param bucket_name: the name of the bucket |
| :type bucket_name: str |
| :param delimiter: the delimiter marks key hierarchy |
| :type delimiter: str |
| |
| |
| |
| |
| .. method:: get_wildcard_key(self, wildcard_key, bucket_name=None, delimiter='') |
| |
| Returns a boto3.s3.Object object matching the wildcard expression |
| |
| :param wildcard_key: the path to the key |
| :type wildcard_key: str |
| :param bucket_name: the name of the bucket |
| :type bucket_name: str |
| :param delimiter: the delimiter marks key hierarchy |
| :type delimiter: str |
| |
| |
| |
| |
| .. method:: load_file(self, filename, key, bucket_name=None, replace=False, encrypt=False, gzip=False, acl_policy=None) |
| |
| Loads a local file to S3 |
| |
| :param filename: name of the file to load. |
| :type filename: str |
| :param key: S3 key that will point to the file |
| :type key: str |
| :param bucket_name: Name of the bucket in which to store the file |
| :type bucket_name: str |
| :param replace: A flag to decide whether or not to overwrite the key |
| if it already exists. If replace is False and the key exists, an |
| error will be raised. |
| :type replace: bool |
| :param encrypt: If True, the file will be encrypted on the server-side |
| by S3 and will be stored in an encrypted form while at rest in S3. |
| :type encrypt: bool |
| :param gzip: If True, the file will be compressed locally |
| :type gzip: bool |
| :param acl_policy: String specifying the canned ACL policy for the file being |
| uploaded to the S3 bucket. |
| :type acl_policy: str |
| |
| |
| |
| |
| .. method:: load_string(self, string_data, key, bucket_name=None, replace=False, encrypt=False, encoding='utf-8', acl_policy=None) |
| |
| Loads a string to S3 |
| |
| This is provided as a convenience to drop a string in S3. It uses the |
| boto infrastructure to ship a file to s3. |
| |
| :param string_data: str to set as content for the key. |
| :type string_data: str |
| :param key: S3 key that will point to the file |
| :type key: str |
| :param bucket_name: Name of the bucket in which to store the file |
| :type bucket_name: str |
| :param replace: A flag to decide whether or not to overwrite the key |
| if it already exists |
| :type replace: bool |
| :param encrypt: If True, the file will be encrypted on the server-side |
| by S3 and will be stored in an encrypted form while at rest in S3. |
| :type encrypt: bool |
| :param encoding: The string to byte encoding |
| :type encoding: str |
| :param acl_policy: The string to specify the canned ACL policy for the |
| object to be uploaded |
| :type acl_policy: str |
| |
| |
| |
| |
| .. method:: load_bytes(self, bytes_data, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None) |
| |
| Loads bytes to S3 |
| |
| This is provided as a convenience to drop a string in S3. It uses the |
| boto infrastructure to ship a file to s3. |
| |
| :param bytes_data: bytes to set as content for the key. |
| :type bytes_data: bytes |
| :param key: S3 key that will point to the file |
| :type key: str |
| :param bucket_name: Name of the bucket in which to store the file |
| :type bucket_name: str |
| :param replace: A flag to decide whether or not to overwrite the key |
| if it already exists |
| :type replace: bool |
| :param encrypt: If True, the file will be encrypted on the server-side |
| by S3 and will be stored in an encrypted form while at rest in S3. |
| :type encrypt: bool |
| :param acl_policy: The string to specify the canned ACL policy for the |
| object to be uploaded |
| :type acl_policy: str |
| |
| |
| |
| |
| .. method:: load_file_obj(self, file_obj, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None) |
| |
| Loads a file object to S3 |
| |
| :param file_obj: The file-like object to set as the content for the S3 key. |
| :type file_obj: file-like object |
| :param key: S3 key that will point to the file |
| :type key: str |
| :param bucket_name: Name of the bucket in which to store the file |
| :type bucket_name: str |
| :param replace: A flag that indicates whether to overwrite the key |
| if it already exists. |
| :type replace: bool |
| :param encrypt: If True, S3 encrypts the file on the server, |
| and the file is stored in encrypted form at rest in S3. |
| :type encrypt: bool |
| :param acl_policy: The string to specify the canned ACL policy for the |
| object to be uploaded |
| :type acl_policy: str |
| |
| |
| |
| |
| .. method:: _upload_file_obj(self, file_obj, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None) |
| |
| |
| |
| |
| .. method:: copy_object(self, source_bucket_key, dest_bucket_key, source_bucket_name=None, dest_bucket_name=None, source_version_id=None, acl_policy='private') |
| |
| Creates a copy of an object that is already stored in S3. |
| |
| Note: the S3 connection used here needs to have access to both |
| source and destination bucket/key. |
| |
| :param source_bucket_key: The key of the source object. |
| |
| It can be either full s3:// style url or relative path from root level. |
| |
| When it's specified as a full s3:// url, please omit source_bucket_name. |
| :type source_bucket_key: str |
| :param dest_bucket_key: The key of the object to copy to. |
| |
| The convention to specify `dest_bucket_key` is the same |
| as `source_bucket_key`. |
| :type dest_bucket_key: str |
| :param source_bucket_name: Name of the S3 bucket where the source object is in. |
| |
| It should be omitted when `source_bucket_key` is provided as a full s3:// url. |
| :type source_bucket_name: str |
| :param dest_bucket_name: Name of the S3 bucket to where the object is copied. |
| |
| It should be omitted when `dest_bucket_key` is provided as a full s3:// url. |
| :type dest_bucket_name: str |
| :param source_version_id: Version ID of the source object (OPTIONAL) |
| :type source_version_id: str |
| :param acl_policy: The string to specify the canned ACL policy for the |
| object to be copied which is private by default. |
| :type acl_policy: str |
| |
| |
| |
| |
| .. method:: delete_objects(self, bucket, keys) |
| |
| :param bucket: Name of the bucket in which you are going to delete object(s) |
| :type bucket: str |
| :param keys: The key(s) to delete from S3 bucket. |
| |
| When ``keys`` is a string, it's supposed to be the key name of |
| the single object to delete. |
| |
| When ``keys`` is a list, it's supposed to be the list of the |
| keys to delete. |
| :type keys: str or list |
| |
| |
| |
| |