blob: 10fd394d7f73e7b174bda1eb4f77bc075faa64f4 [file] [log] [blame]
:mod:`airflow.providers.amazon.aws.hooks.glue_crawler`
======================================================
.. py:module:: airflow.providers.amazon.aws.hooks.glue_crawler
Module Contents
---------------
.. py:class:: AwsGlueCrawlerHook(*args, **kwargs)
Bases: :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
Interacts with AWS Glue Crawler.
Additional arguments (such as ``aws_conn_id``) may be specified and
are passed down to the underlying AwsBaseHook.
.. seealso::
:class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
.. method:: glue_client(self)
:return: AWS Glue client
.. method:: has_crawler(self, crawler_name)
Checks if the crawler already exists
:param crawler_name: unique crawler name per AWS account
:type crawler_name: str
:return: Returns True if the crawler already exists and False if not.
.. method:: get_crawler(self, crawler_name: str)
Gets crawler configurations
:param crawler_name: unique crawler name per AWS account
:type crawler_name: str
:return: Nested dictionary of crawler configurations
.. method:: update_crawler(self, **crawler_kwargs)
Updates crawler configurations
:param crawler_kwargs: Keyword args that define the configurations used for the crawler
:type crawler_kwargs: any
:return: True if crawler was updated and false otherwise
.. method:: create_crawler(self, **crawler_kwargs)
Creates an AWS Glue Crawler
:param crawler_kwargs: Keyword args that define the configurations used to create the crawler
:type crawler_kwargs: any
:return: Name of the crawler
.. method:: start_crawler(self, crawler_name: str)
Triggers the AWS Glue crawler
:param crawler_name: unique crawler name per AWS account
:type crawler_name: str
:return: Empty dictionary
.. method:: wait_for_crawler_completion(self, crawler_name: str, poll_interval: int = 5)
Waits until Glue crawler completes and
returns the status of the latest crawl run.
Raises AirflowException if the crawler fails or is cancelled.
:param crawler_name: unique crawler name per AWS account
:type crawler_name: str
:param poll_interval: Time (in seconds) to wait between two consecutive calls to check crawler status
:type poll_interval: int
:return: Crawler's status