| :py:mod:`airflow.providers.amazon.aws.hooks.glue` |
| ================================================= |
| |
| .. py:module:: airflow.providers.amazon.aws.hooks.glue |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.amazon.aws.hooks.glue.GlueJobHook |
| airflow.providers.amazon.aws.hooks.glue.AwsGlueJobHook |
| |
| |
| |
| |
| .. py:class:: GlueJobHook(s3_bucket = None, job_name = None, desc = None, concurrent_run_limit = 1, script_location = None, retry_limit = 0, num_of_dpus = None, iam_role_name = None, create_job_kwargs = None, *args, **kwargs) |
| |
| Bases: :py:obj:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook` |
| |
| Interact with AWS Glue - create job, trigger, crawler |
| |
| :param s3_bucket: S3 bucket where logs and local etl script will be uploaded |
| :param job_name: unique job name per AWS account |
| :param desc: job description |
| :param concurrent_run_limit: The maximum number of concurrent runs allowed for a job |
| :param script_location: path to etl script on s3 |
| :param retry_limit: Maximum number of times to retry this job if it fails |
| :param num_of_dpus: Number of AWS Glue DPUs to allocate to this Job |
| :param region_name: aws region name (example: us-east-1) |
| :param iam_role_name: AWS IAM Role for Glue Job Execution |
| :param create_job_kwargs: Extra arguments for Glue Job Creation |
| |
| .. py:attribute:: JOB_POLL_INTERVAL |
| :annotation: = 6 |
| |
| |
| |
| .. py:method:: list_jobs(self) |
| |
| :return: Lists of Jobs |
| |
| |
| .. py:method:: get_iam_execution_role(self) |
| |
| :return: iam role for job execution |
| |
| |
| .. py:method:: initialize_job(self, script_arguments = None, run_kwargs = None) |
| |
| Initializes connection with AWS Glue |
| to run job |
| :return: |
| |
| |
| .. py:method:: get_job_state(self, job_name, run_id) |
| |
| Get state of the Glue job. The job state can be |
| running, finished, failed, stopped or timeout. |
| :param job_name: unique job name per AWS account |
| :param run_id: The job-run ID of the predecessor job run |
| :return: State of the Glue job |
| |
| |
| .. py:method:: job_completion(self, job_name, run_id) |
| |
| Waits until Glue job with job_name completes or |
| fails and return final state if finished. |
| Raises AirflowException when the job failed |
| :param job_name: unique job name per AWS account |
| :param run_id: The job-run ID of the predecessor job run |
| :return: Dict of JobRunState and JobRunId |
| |
| |
| .. py:method:: get_or_create_glue_job(self) |
| |
| Creates(or just returns) and returns the Job name |
| :return:Name of the Job |
| |
| |
| |
| .. py:class:: AwsGlueJobHook(*args, **kwargs) |
| |
| Bases: :py:obj:`GlueJobHook` |
| |
| This hook is deprecated. |
| Please use :class:`airflow.providers.amazon.aws.hooks.glue.GlueJobHook`. |
| |
| |