| :py:mod:`airflow.providers.amazon.aws.hooks.glue` |
| ================================================= |
| |
| .. py:module:: airflow.providers.amazon.aws.hooks.glue |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.amazon.aws.hooks.glue.GlueJobHook |
| |
| |
| |
| |
| Attributes |
| ~~~~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.amazon.aws.hooks.glue.DEFAULT_LOG_SUFFIX |
| airflow.providers.amazon.aws.hooks.glue.FAILURE_LOG_SUFFIX |
| airflow.providers.amazon.aws.hooks.glue.DEFAULT_LOG_FILTER |
| airflow.providers.amazon.aws.hooks.glue.FAILURE_LOG_FILTER |
| |
| |
| .. py:data:: DEFAULT_LOG_SUFFIX |
| :annotation: = output |
| |
| |
| |
| .. py:data:: FAILURE_LOG_SUFFIX |
| :annotation: = error |
| |
| |
| |
| .. py:data:: DEFAULT_LOG_FILTER |
| :annotation: = |
| |
| |
| |
| .. py:data:: FAILURE_LOG_FILTER |
| :annotation: = ?ERROR ?Exception |
| |
| |
| |
| .. py:class:: GlueJobHook(s3_bucket = None, job_name = None, desc = None, concurrent_run_limit = 1, script_location = None, retry_limit = 0, num_of_dpus = None, iam_role_name = None, create_job_kwargs = None, *args, **kwargs) |
| |
| Bases: :py:obj:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook` |
| |
| Interact with AWS Glue - create job, trigger, crawler |
| |
| :param s3_bucket: S3 bucket where logs and local etl script will be uploaded |
| :param job_name: unique job name per AWS account |
| :param desc: job description |
| :param concurrent_run_limit: The maximum number of concurrent runs allowed for a job |
| :param script_location: path to etl script on s3 |
| :param retry_limit: Maximum number of times to retry this job if it fails |
| :param num_of_dpus: Number of AWS Glue DPUs to allocate to this Job |
| :param region_name: aws region name (example: us-east-1) |
| :param iam_role_name: AWS IAM Role for Glue Job Execution |
| :param create_job_kwargs: Extra arguments for Glue Job Creation |
| |
| .. py:attribute:: JOB_POLL_INTERVAL |
| :annotation: = 6 |
| |
| |
| |
| .. py:method:: list_jobs() |
| |
| :return: Lists of Jobs |
| |
| |
| .. py:method:: get_iam_execution_role() |
| |
| :return: iam role for job execution |
| |
| |
| .. py:method:: initialize_job(script_arguments = None, run_kwargs = None) |
| |
| Initializes connection with AWS Glue |
| to run job |
| :return: |
| |
| |
| .. py:method:: get_job_state(job_name, run_id) |
| |
| Get state of the Glue job. The job state can be |
| running, finished, failed, stopped or timeout. |
| :param job_name: unique job name per AWS account |
| :param run_id: The job-run ID of the predecessor job run |
| :return: State of the Glue job |
| |
| |
| .. py:method:: print_job_logs(job_name, run_id, job_failed = False, next_token = None) |
| |
| Prints the batch of logs to the Airflow task log and returns nextToken. |
| |
| |
| .. py:method:: job_completion(job_name, run_id, verbose = False) |
| |
| Waits until Glue job with job_name completes or |
| fails and return final state if finished. |
| Raises AirflowException when the job failed |
| :param job_name: unique job name per AWS account |
| :param run_id: The job-run ID of the predecessor job run |
| :param verbose: If True, more Glue Job Run logs show in the Airflow Task Logs. (default: False) |
| :return: Dict of JobRunState and JobRunId |
| |
| |
| .. py:method:: get_or_create_glue_job() |
| |
| Creates(or just returns) and returns the Job name |
| :return:Name of the Job |
| |
| |
| |