blob: e276c2e2b45f132077c061f469eea9a43f8918f4 [file] [log] [blame]
:mod:`airflow.providers.amazon.aws.hooks.s3`
============================================
.. py:module:: airflow.providers.amazon.aws.hooks.s3
.. autoapi-nested-parse::
Interact with AWS S3, using the boto3 library.
Module Contents
---------------
.. data:: T
.. function:: provide_bucket_name(func: T) -> T
Function decorator that provides a bucket name taken from the connection
in case no bucket name has been passed to the function.
.. function:: unify_bucket_name_and_key(func: T) -> T
Function decorator that unifies bucket name and key taken from the key
in case no bucket name and at least a key has been passed to the function.
.. py:class:: S3Hook(*args, **kwargs)
Bases: :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
Interact with AWS S3, using the boto3 library.
Additional arguments (such as ``aws_conn_id``) may be specified and
are passed down to the underlying AwsBaseHook.
.. seealso::
:class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
.. attribute:: conn_type
:annotation: = s3
.. attribute:: hook_name
:annotation: = S3
.. staticmethod:: parse_s3_url(s3url: str)
Parses the S3 Url into a bucket name and key.
:param s3url: The S3 Url to parse.
:rtype s3url: str
:return: the parsed bucket name and key
:rtype: tuple of str
.. method:: check_for_bucket(self, bucket_name: Optional[str] = None)
Check if bucket_name exists.
:param bucket_name: the name of the bucket
:type bucket_name: str
:return: True if it exists and False if not.
:rtype: bool
.. method:: get_bucket(self, bucket_name: Optional[str] = None)
Returns a boto3.S3.Bucket object
:param bucket_name: the name of the bucket
:type bucket_name: str
:return: the bucket object to the bucket name.
:rtype: boto3.S3.Bucket
.. method:: create_bucket(self, bucket_name: Optional[str] = None, region_name: Optional[str] = None)
Creates an Amazon S3 bucket.
:param bucket_name: The name of the bucket
:type bucket_name: str
:param region_name: The name of the aws region in which to create the bucket.
:type region_name: str
.. method:: check_for_prefix(self, prefix: str, delimiter: str, bucket_name: Optional[str] = None)
Checks that a prefix exists in a bucket
:param bucket_name: the name of the bucket
:type bucket_name: str
:param prefix: a key prefix
:type prefix: str
:param delimiter: the delimiter marks key hierarchy.
:type delimiter: str
:return: False if the prefix does not exist in the bucket and True if it does.
:rtype: bool
.. method:: list_prefixes(self, bucket_name: Optional[str] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, page_size: Optional[int] = None, max_items: Optional[int] = None)
Lists prefixes in a bucket under prefix
:param bucket_name: the name of the bucket
:type bucket_name: str
:param prefix: a key prefix
:type prefix: str
:param delimiter: the delimiter marks key hierarchy.
:type delimiter: str
:param page_size: pagination size
:type page_size: int
:param max_items: maximum items to return
:type max_items: int
:return: a list of matched prefixes
:rtype: list
.. method:: list_keys(self, bucket_name: Optional[str] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, page_size: Optional[int] = None, max_items: Optional[int] = None)
Lists keys in a bucket under prefix and not containing delimiter
:param bucket_name: the name of the bucket
:type bucket_name: str
:param prefix: a key prefix
:type prefix: str
:param delimiter: the delimiter marks key hierarchy.
:type delimiter: str
:param page_size: pagination size
:type page_size: int
:param max_items: maximum items to return
:type max_items: int
:return: a list of matched keys
:rtype: list
.. method:: check_for_key(self, key: str, bucket_name: Optional[str] = None)
Checks if a key exists in a bucket
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which the file is stored
:type bucket_name: str
:return: True if the key exists and False if not.
:rtype: bool
.. method:: get_key(self, key: str, bucket_name: Optional[str] = None)
Returns a boto3.s3.Object
:param key: the path to the key
:type key: str
:param bucket_name: the name of the bucket
:type bucket_name: str
:return: the key object from the bucket
:rtype: boto3.s3.Object
.. method:: read_key(self, key: str, bucket_name: Optional[str] = None)
Reads a key from S3
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which the file is stored
:type bucket_name: str
:return: the content of the key
:rtype: str
.. method:: select_key(self, key: str, bucket_name: Optional[str] = None, expression: Optional[str] = None, expression_type: Optional[str] = None, input_serialization: Optional[Dict[str, Any]] = None, output_serialization: Optional[Dict[str, Any]] = None)
Reads a key with S3 Select.
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which the file is stored
:type bucket_name: str
:param expression: S3 Select expression
:type expression: str
:param expression_type: S3 Select expression type
:type expression_type: str
:param input_serialization: S3 Select input data serialization format
:type input_serialization: dict
:param output_serialization: S3 Select output data serialization format
:type output_serialization: dict
:return: retrieved subset of original data by S3 Select
:rtype: str
.. seealso::
For more details about S3 Select parameters:
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.select_object_content
.. method:: check_for_wildcard_key(self, wildcard_key: str, bucket_name: Optional[str] = None, delimiter: str = '')
Checks that a key matching a wildcard expression exists in a bucket
:param wildcard_key: the path to the key
:type wildcard_key: str
:param bucket_name: the name of the bucket
:type bucket_name: str
:param delimiter: the delimiter marks key hierarchy
:type delimiter: str
:return: True if a key exists and False if not.
:rtype: bool
.. method:: get_wildcard_key(self, wildcard_key: str, bucket_name: Optional[str] = None, delimiter: str = '')
Returns a boto3.s3.Object object matching the wildcard expression
:param wildcard_key: the path to the key
:type wildcard_key: str
:param bucket_name: the name of the bucket
:type bucket_name: str
:param delimiter: the delimiter marks key hierarchy
:type delimiter: str
:return: the key object from the bucket or None if none has been found.
:rtype: boto3.s3.Object
.. method:: load_file(self, filename: str, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, gzip: bool = False, acl_policy: Optional[str] = None)
Loads a local file to S3
:param filename: name of the file to load.
:type filename: str
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which to store the file
:type bucket_name: str
:param replace: A flag to decide whether or not to overwrite the key
if it already exists. If replace is False and the key exists, an
error will be raised.
:type replace: bool
:param encrypt: If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.
:type encrypt: bool
:param gzip: If True, the file will be compressed locally
:type gzip: bool
:param acl_policy: String specifying the canned ACL policy for the file being
uploaded to the S3 bucket.
:type acl_policy: str
.. method:: load_string(self, string_data: str, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, encoding: Optional[str] = None, acl_policy: Optional[str] = None)
Loads a string to S3
This is provided as a convenience to drop a string in S3. It uses the
boto infrastructure to ship a file to s3.
:param string_data: str to set as content for the key.
:type string_data: str
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which to store the file
:type bucket_name: str
:param replace: A flag to decide whether or not to overwrite the key
if it already exists
:type replace: bool
:param encrypt: If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.
:type encrypt: bool
:param encoding: The string to byte encoding
:type encoding: str
:param acl_policy: The string to specify the canned ACL policy for the
object to be uploaded
:type acl_policy: str
.. method:: load_bytes(self, bytes_data: bytes, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, acl_policy: Optional[str] = None)
Loads bytes to S3
This is provided as a convenience to drop a string in S3. It uses the
boto infrastructure to ship a file to s3.
:param bytes_data: bytes to set as content for the key.
:type bytes_data: bytes
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which to store the file
:type bucket_name: str
:param replace: A flag to decide whether or not to overwrite the key
if it already exists
:type replace: bool
:param encrypt: If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.
:type encrypt: bool
:param acl_policy: The string to specify the canned ACL policy for the
object to be uploaded
:type acl_policy: str
.. method:: load_file_obj(self, file_obj: BytesIO, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, acl_policy: Optional[str] = None)
Loads a file object to S3
:param file_obj: The file-like object to set as the content for the S3 key.
:type file_obj: file-like object
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which to store the file
:type bucket_name: str
:param replace: A flag that indicates whether to overwrite the key
if it already exists.
:type replace: bool
:param encrypt: If True, S3 encrypts the file on the server,
and the file is stored in encrypted form at rest in S3.
:type encrypt: bool
:param acl_policy: The string to specify the canned ACL policy for the
object to be uploaded
:type acl_policy: str
.. method:: _upload_file_obj(self, file_obj: BytesIO, key: str, bucket_name: Optional[str] = None, replace: bool = False, encrypt: bool = False, acl_policy: Optional[str] = None)
.. method:: copy_object(self, source_bucket_key: str, dest_bucket_key: str, source_bucket_name: Optional[str] = None, dest_bucket_name: Optional[str] = None, source_version_id: Optional[str] = None, acl_policy: Optional[str] = None)
Creates a copy of an object that is already stored in S3.
Note: the S3 connection used here needs to have access to both
source and destination bucket/key.
:param source_bucket_key: The key of the source object.
It can be either full s3:// style url or relative path from root level.
When it's specified as a full s3:// url, please omit source_bucket_name.
:type source_bucket_key: str
:param dest_bucket_key: The key of the object to copy to.
The convention to specify `dest_bucket_key` is the same
as `source_bucket_key`.
:type dest_bucket_key: str
:param source_bucket_name: Name of the S3 bucket where the source object is in.
It should be omitted when `source_bucket_key` is provided as a full s3:// url.
:type source_bucket_name: str
:param dest_bucket_name: Name of the S3 bucket to where the object is copied.
It should be omitted when `dest_bucket_key` is provided as a full s3:// url.
:type dest_bucket_name: str
:param source_version_id: Version ID of the source object (OPTIONAL)
:type source_version_id: str
:param acl_policy: The string to specify the canned ACL policy for the
object to be copied which is private by default.
:type acl_policy: str
.. method:: delete_bucket(self, bucket_name: str, force_delete: bool = False)
To delete s3 bucket, delete all s3 bucket objects and then delete the bucket.
:param bucket_name: Bucket name
:type bucket_name: str
:param force_delete: Enable this to delete bucket even if not empty
:type force_delete: bool
:return: None
:rtype: None
.. method:: delete_objects(self, bucket: str, keys: Union[str, list])
Delete keys from the bucket.
:param bucket: Name of the bucket in which you are going to delete object(s)
:type bucket: str
:param keys: The key(s) to delete from S3 bucket.
When ``keys`` is a string, it's supposed to be the key name of
the single object to delete.
When ``keys`` is a list, it's supposed to be the list of the
keys to delete.
:type keys: str or list
.. method:: download_file(self, key: str, bucket_name: Optional[str] = None, local_path: Optional[str] = None)
Downloads a file from the S3 location to the local file system.
:param key: The key path in S3.
:type key: str
:param bucket_name: The specific bucket to use.
:type bucket_name: Optional[str]
:param local_path: The local path to the downloaded file. If no path is provided it will use the
system's temporary directory.
:type local_path: Optional[str]
:return: the file name.
:rtype: str
.. method:: generate_presigned_url(self, client_method: str, params: Optional[dict] = None, expires_in: int = 3600, http_method: Optional[str] = None)
Generate a presigned url given a client, its method, and arguments
:param client_method: The client method to presign for.
:type client_method: str
:param params: The parameters normally passed to ClientMethod.
:type params: dict
:param expires_in: The number of seconds the presigned url is valid for.
By default it expires in an hour (3600 seconds).
:type expires_in: int
:param http_method: The http method to use on the generated url.
By default, the http method is whatever is used in the method's model.
:type http_method: str
:return: The presigned url.
:rtype: str