| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| |
| |
| .. _howto/connection:gcp: |
| |
| Google Cloud Connection |
| ================================ |
| |
| The Google Cloud connection type enables the Google Cloud Integrations. |
| |
| Authenticating to Google Cloud |
| ------------------------------ |
| |
| There are two ways to connect to Google Cloud using Airflow. |
| |
| 1. Using a `Application Default Credentials |
| <https://google-auth.readthedocs.io/en/latest/reference/google.auth.html#google.auth.default>`_, |
| 2. Using a `service account |
| <https://cloud.google.com/docs/authentication/#service_accounts>`_ by specifying a key file in JSON format. |
| Key can be specified as a path to the key file (``Keyfile Path``), as a key payload (``Keyfile JSON``) |
| or as secret in Secret Manager (``Keyfile secret name``). Only one way of defining the key can be used at a time. |
| If you need to manage multiple keys then you should configure multiple connections. |
| |
| .. warning:: Additional permissions might be needed |
| |
| Connection which uses key from the Secret Manager requires that `Application Default Credentials |
| <https://google-auth.readthedocs.io/en/latest/reference/google.auth.html#google.auth.default>`_ (ADC) |
| have permission to access payloads of secrets. |
| |
| .. note:: Alternative way of storing connections |
| |
| Besides storing only key in Secret Manager there is an option for storing entire connection. |
| For more details take a look at :ref:`Google Secret Manager Backend <google_cloud_secret_manager_backend>`. |
| |
| Default Connection IDs |
| ---------------------- |
| |
| All hooks and operators related to Google Cloud use ``google_cloud_default`` by default. |
| |
| |
| Note On Application Default Credentials |
| --------------------------------------- |
| Application Default Credentials are inferred by the GCE metadata server when running |
| Airflow on Google Compute Engine or the GKE metadata server |
| when running on GKE which allows mapping Kubernetes Service Accounts to GCP service accounts |
| `Workload Identity |
| <https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity>`_. |
| This can be useful when managing minimum permissions for multiple Airflow instances on a single GKE cluster which |
| each have a different IAM footprint. Simply assign KSAs for your worker / webserver deployments and workload identity |
| will map them to separate GCP Service Accounts (rather than sharing a cluster-level GCE service account). |
| From a security perspective it has the benefit of not storing Google Service Account |
| keys on disk nor in the Airflow database, making it impossible |
| to leak the sensitive long lived credential key material. |
| |
| From an Airflow perspective Application Default Credentials can be used for |
| a connection by specifying an empty URI. |
| |
| For example: |
| |
| .. code-block:: bash |
| |
| export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://' |
| |
| Configuring the Connection |
| -------------------------- |
| |
| Project Id (optional) |
| The Google Cloud project ID to connect to. It is used as default project id by operators using it and |
| can usually be overridden at the operator level. |
| |
| Keyfile Path |
| Path to a `service account |
| <https://cloud.google.com/docs/authentication/#service_accounts>`_ key |
| file (JSON format) on disk. |
| |
| Not required if using application default credentials. |
| |
| Keyfile JSON |
| Contents of a `service account |
| <https://cloud.google.com/docs/authentication/#service_accounts>`_ key |
| file (JSON format) on disk. |
| |
| Not required if using application default credentials. |
| |
| Secret name which holds Keyfile JSON |
| Name of the secret in Secret Manager which contains a `service account |
| <https://cloud.google.com/docs/authentication/#service_accounts>`_ key. |
| |
| Not required if using application default credentials. |
| |
| Scopes (comma separated) |
| A list of comma-separated `Google Cloud scopes |
| <https://developers.google.com/identity/protocols/googlescopes>`_ to |
| authenticate with. |
| |
| Number of Retries |
| Integer, number of times to retry with randomized |
| exponential backoff. If all retries fail, the :class:`googleapiclient.errors.HttpError` |
| represents the last request. If zero (default), we attempt the |
| request only once. |
| |
| When specifying the connection in environment variable you should specify |
| it using URI syntax, with the following requirements: |
| |
| * scheme part should be equals ``google-cloud-platform`` (Note: look for a |
| hyphen character) |
| * authority (username, password, host, port), path is ignored |
| * query parameters contains information specific to this type of |
| connection. The following keys are accepted: |
| |
| * ``extra__google_cloud_platform__project`` - Project Id |
| * ``extra__google_cloud_platform__key_path`` - Keyfile Path |
| * ``extra__google_cloud_platform__keyfile_dict`` - Keyfile JSON |
| * ``extra__google_cloud_platform__key_secret_name`` - Secret name which holds Keyfile JSON |
| * ``extra__google_cloud_platform__key_secret_project_id`` - Project Id which holds Keyfile JSON |
| * ``extra__google_cloud_platform__scope`` - Scopes |
| * ``extra__google_cloud_platform__num_retries`` - Number of Retries |
| |
| Note that all components of the URI should be URL-encoded. |
| |
| For example: |
| |
| .. code-block:: bash |
| |
| export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://?extra__google_cloud_platform__key_path=%2Fkeys%2Fkey.json&extra__google_cloud_platform__scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform&extra__google_cloud_platform__project=airflow&extra__google_cloud_platform__num_retries=5' |
| |
| .. _howto/connection:gcp:impersonation: |
| |
| Direct impersonation of a service account |
| ----------------------------------------- |
| |
| Google operators support `direct impersonation of a service account |
| <https://cloud.google.com/iam/docs/understanding-service-accounts#directly_impersonating_a_service_account>`_ |
| via ``impersonation_chain`` argument (``google_impersonation_chain`` in case of operators |
| that also communicate with services of other cloud providers). |
| |
| For example: |
| |
| .. code-block:: python |
| |
| import os |
| |
| from airflow.providers.google.cloud.operators.bigquery import ( |
| BigQueryCreateEmptyDatasetOperator, |
| ) |
| |
| IMPERSONATION_CHAIN = "impersonated_account@your_project_id.iam.gserviceaccount.com" |
| |
| create_dataset = BigQueryCreateEmptyDatasetOperator( |
| task_id="create-dataset", |
| gcp_conn_id="google_cloud_default", |
| dataset_id="test_dataset", |
| location="southamerica-east1", |
| impersonation_chain=IMPERSONATION_CHAIN, |
| ) |
| |
| In order for this example to work, the account ``impersonated_account`` must grant the |
| ``Service Account Token Creator`` IAM role to the service account specified in the |
| ``google_cloud_default`` Connection. This will allow to generate ``impersonated_account``'s |
| access token, which will allow to act on its behalf using its permissions. ``impersonated_account`` |
| does not even need to have a generated key. |
| |
| .. warning:: |
| :class:`~airflow.providers.google.cloud.operators.dataflow.DataflowCreateJavaJobOperator` and |
| :class:`~airflow.providers.google.cloud.operators.dataflow.DataflowCreatePythonJobOperator` |
| do not support direct impersonation as of now. |
| |
| In case of operators that connect to multiple Google services, all hooks use the same value of |
| ``impersonation_chain`` (if applicable). You can also impersonate accounts from projects |
| other than the project of the originating account. In that case, the project id of the impersonated |
| account will be used as the default project id in operator's logic, unless you have explicitly |
| specified the Project Id in Connection's configuration or in operator's arguments. |
| |
| Impersonation can also be used in chain: if the service account specified in Connection has |
| ``Service Account Token Creator`` role granted on account A, and account A has this role on account |
| B, then we are able to impersonate account B. |
| |
| For example, with the following ``terraform`` setup... |
| |
| .. code-block:: terraform |
| |
| terraform { |
| required_version = "> 0.11.14" |
| } |
| provider "google" { |
| } |
| variable "project_id" { |
| type = "string" |
| } |
| resource "google_service_account" "sa_1" { |
| account_id = "impersonation-chain-1" |
| project = "${var.project_id}" |
| } |
| resource "google_service_account" "sa_2" { |
| account_id = "impersonation-chain-2" |
| project = "${var.project_id}" |
| } |
| resource "google_service_account" "sa_3" { |
| account_id = "impersonation-chain-3" |
| project = "${var.project_id}" |
| } |
| resource "google_service_account" "sa_4" { |
| account_id = "impersonation-chain-4" |
| project = "${var.project_id}" |
| } |
| resource "google_service_account_iam_member" "sa_4_member" { |
| service_account_id = "${google_service_account.sa_4.name}" |
| role = "roles/iam.serviceAccountTokenCreator" |
| member = "serviceAccount:${google_service_account.sa_3.email}" |
| } |
| resource "google_service_account_iam_member" "sa_3_member" { |
| service_account_id = "${google_service_account.sa_3.name}" |
| role = "roles/iam.serviceAccountTokenCreator" |
| member = "serviceAccount:${google_service_account.sa_2.email}" |
| } |
| resource "google_service_account_iam_member" "sa_2_member" { |
| service_account_id = "${google_service_account.sa_2.name}" |
| role = "roles/iam.serviceAccountTokenCreator" |
| member = "serviceAccount:${google_service_account.sa_1.email}" |
| } |
| |
| ...we should configure Airflow Connection to use ``impersonation-chain-1`` account's key and provide |
| following value for ``impersonation_chain`` argument... |
| |
| .. code-block:: python |
| |
| PROJECT_ID = os.environ.get("TF_VAR_project_id", "your_project_id") |
| IMPERSONATION_CHAIN = [ |
| f"impersonation-chain-2@{PROJECT_ID}.iam.gserviceaccount.com", |
| f"impersonation-chain-3@{PROJECT_ID}.iam.gserviceaccount.com", |
| f"impersonation-chain-4@{PROJECT_ID}.iam.gserviceaccount.com", |
| ] |
| |
| ...then requests will be executed using ``impersonation-chain-4`` account's privileges. |