blob: 5a1c18b2c717a4cb52f4ac745254c60c544461f5 [file] [log] [blame]
:mod:`airflow.hooks.S3_hook`
============================
.. py:module:: airflow.hooks.S3_hook
.. autoapi-nested-parse::
Interact with AWS S3, using the boto3 library.
Module Contents
---------------
.. py:class:: S3Hook
Bases: :class:`airflow.contrib.hooks.aws_hook.AwsHook`
Interact with AWS S3, using the boto3 library.
.. method:: get_conn(self)
.. staticmethod:: parse_s3_url(s3url)
.. method:: check_for_bucket(self, bucket_name)
Check if bucket_name exists.
:param bucket_name: the name of the bucket
:type bucket_name: str
.. method:: get_bucket(self, bucket_name)
Returns a boto3.S3.Bucket object
:param bucket_name: the name of the bucket
:type bucket_name: str
.. method:: create_bucket(self, bucket_name, region_name=None)
Creates an Amazon S3 bucket.
:param bucket_name: The name of the bucket
:type bucket_name: str
:param region_name: The name of the aws region in which to create the bucket.
:type region_name: str
.. method:: check_for_prefix(self, bucket_name, prefix, delimiter)
Checks that a prefix exists in a bucket
:param bucket_name: the name of the bucket
:type bucket_name: str
:param prefix: a key prefix
:type prefix: str
:param delimiter: the delimiter marks key hierarchy.
:type delimiter: str
.. method:: list_prefixes(self, bucket_name, prefix='', delimiter='', page_size=None, max_items=None)
Lists prefixes in a bucket under prefix
:param bucket_name: the name of the bucket
:type bucket_name: str
:param prefix: a key prefix
:type prefix: str
:param delimiter: the delimiter marks key hierarchy.
:type delimiter: str
:param page_size: pagination size
:type page_size: int
:param max_items: maximum items to return
:type max_items: int
.. method:: list_keys(self, bucket_name, prefix='', delimiter='', page_size=None, max_items=None)
Lists keys in a bucket under prefix and not containing delimiter
:param bucket_name: the name of the bucket
:type bucket_name: str
:param prefix: a key prefix
:type prefix: str
:param delimiter: the delimiter marks key hierarchy.
:type delimiter: str
:param page_size: pagination size
:type page_size: int
:param max_items: maximum items to return
:type max_items: int
.. method:: check_for_key(self, key, bucket_name=None)
Checks if a key exists in a bucket
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which the file is stored
:type bucket_name: str
.. method:: get_key(self, key, bucket_name=None)
Returns a boto3.s3.Object
:param key: the path to the key
:type key: str
:param bucket_name: the name of the bucket
:type bucket_name: str
.. method:: read_key(self, key, bucket_name=None)
Reads a key from S3
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which the file is stored
:type bucket_name: str
.. method:: select_key(self, key, bucket_name=None, expression='SELECT * FROM S3Object', expression_type='SQL', input_serialization=None, output_serialization=None)
Reads a key with S3 Select.
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which the file is stored
:type bucket_name: str
:param expression: S3 Select expression
:type expression: str
:param expression_type: S3 Select expression type
:type expression_type: str
:param input_serialization: S3 Select input data serialization format
:type input_serialization: dict
:param output_serialization: S3 Select output data serialization format
:type output_serialization: dict
:return: retrieved subset of original data by S3 Select
:rtype: str
.. seealso::
For more details about S3 Select parameters:
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.select_object_content
.. method:: check_for_wildcard_key(self, wildcard_key, bucket_name=None, delimiter='')
Checks that a key matching a wildcard expression exists in a bucket
:param wildcard_key: the path to the key
:type wildcard_key: str
:param bucket_name: the name of the bucket
:type bucket_name: str
:param delimiter: the delimiter marks key hierarchy
:type delimiter: str
.. method:: get_wildcard_key(self, wildcard_key, bucket_name=None, delimiter='')
Returns a boto3.s3.Object object matching the wildcard expression
:param wildcard_key: the path to the key
:type wildcard_key: str
:param bucket_name: the name of the bucket
:type bucket_name: str
:param delimiter: the delimiter marks key hierarchy
:type delimiter: str
.. method:: load_file(self, filename, key, bucket_name=None, replace=False, encrypt=False, gzip=False, acl_policy=None)
Loads a local file to S3
:param filename: name of the file to load.
:type filename: str
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which to store the file
:type bucket_name: str
:param replace: A flag to decide whether or not to overwrite the key
if it already exists. If replace is False and the key exists, an
error will be raised.
:type replace: bool
:param encrypt: If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.
:type encrypt: bool
:param gzip: If True, the file will be compressed locally
:type gzip: bool
:param acl_policy: String specifying the canned ACL policy for the file being
uploaded to the S3 bucket.
:type acl_policy: str
.. method:: load_string(self, string_data, key, bucket_name=None, replace=False, encrypt=False, encoding='utf-8', acl_policy=None)
Loads a string to S3
This is provided as a convenience to drop a string in S3. It uses the
boto infrastructure to ship a file to s3.
:param string_data: str to set as content for the key.
:type string_data: str
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which to store the file
:type bucket_name: str
:param replace: A flag to decide whether or not to overwrite the key
if it already exists
:type replace: bool
:param encrypt: If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.
:type encrypt: bool
:param encoding: The string to byte encoding
:type encoding: str
:param acl_policy: The string to specify the canned ACL policy for the
object to be uploaded
:type acl_policy: str
.. method:: load_bytes(self, bytes_data, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None)
Loads bytes to S3
This is provided as a convenience to drop a string in S3. It uses the
boto infrastructure to ship a file to s3.
:param bytes_data: bytes to set as content for the key.
:type bytes_data: bytes
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which to store the file
:type bucket_name: str
:param replace: A flag to decide whether or not to overwrite the key
if it already exists
:type replace: bool
:param encrypt: If True, the file will be encrypted on the server-side
by S3 and will be stored in an encrypted form while at rest in S3.
:type encrypt: bool
:param acl_policy: The string to specify the canned ACL policy for the
object to be uploaded
:type acl_policy: str
.. method:: load_file_obj(self, file_obj, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None)
Loads a file object to S3
:param file_obj: The file-like object to set as the content for the S3 key.
:type file_obj: file-like object
:param key: S3 key that will point to the file
:type key: str
:param bucket_name: Name of the bucket in which to store the file
:type bucket_name: str
:param replace: A flag that indicates whether to overwrite the key
if it already exists.
:type replace: bool
:param encrypt: If True, S3 encrypts the file on the server,
and the file is stored in encrypted form at rest in S3.
:type encrypt: bool
:param acl_policy: The string to specify the canned ACL policy for the
object to be uploaded
:type acl_policy: str
.. method:: _upload_file_obj(self, file_obj, key, bucket_name=None, replace=False, encrypt=False, acl_policy=None)
.. method:: copy_object(self, source_bucket_key, dest_bucket_key, source_bucket_name=None, dest_bucket_name=None, source_version_id=None, acl_policy='private')
Creates a copy of an object that is already stored in S3.
Note: the S3 connection used here needs to have access to both
source and destination bucket/key.
:param source_bucket_key: The key of the source object.
It can be either full s3:// style url or relative path from root level.
When it's specified as a full s3:// url, please omit source_bucket_name.
:type source_bucket_key: str
:param dest_bucket_key: The key of the object to copy to.
The convention to specify `dest_bucket_key` is the same
as `source_bucket_key`.
:type dest_bucket_key: str
:param source_bucket_name: Name of the S3 bucket where the source object is in.
It should be omitted when `source_bucket_key` is provided as a full s3:// url.
:type source_bucket_name: str
:param dest_bucket_name: Name of the S3 bucket to where the object is copied.
It should be omitted when `dest_bucket_key` is provided as a full s3:// url.
:type dest_bucket_name: str
:param source_version_id: Version ID of the source object (OPTIONAL)
:type source_version_id: str
:param acl_policy: The string to specify the canned ACL policy for the
object to be copied which is private by default.
:type acl_policy: str
.. method:: delete_objects(self, bucket, keys)
:param bucket: Name of the bucket in which you are going to delete object(s)
:type bucket: str
:param keys: The key(s) to delete from S3 bucket.
When ``keys`` is a string, it's supposed to be the key name of
the single object to delete.
When ``keys`` is a list, it's supposed to be the list of the
keys to delete.
:type keys: str or list