blob: 0e9eb41b88ccf264be50baa4c552359409023d70 [file] [log] [blame]
:mod:`airflow.contrib.hooks.databricks_hook`
============================================
.. py:module:: airflow.contrib.hooks.databricks_hook
Module Contents
---------------
.. data:: RESTART_CLUSTER_ENDPOINT
:annotation: = ['POST', 'api/2.0/clusters/restart']
.. data:: START_CLUSTER_ENDPOINT
:annotation: = ['POST', 'api/2.0/clusters/start']
.. data:: TERMINATE_CLUSTER_ENDPOINT
:annotation: = ['POST', 'api/2.0/clusters/delete']
.. data:: RUN_NOW_ENDPOINT
:annotation: = ['POST', 'api/2.0/jobs/run-now']
.. data:: SUBMIT_RUN_ENDPOINT
:annotation: = ['POST', 'api/2.0/jobs/runs/submit']
.. data:: GET_RUN_ENDPOINT
:annotation: = ['GET', 'api/2.0/jobs/runs/get']
.. data:: CANCEL_RUN_ENDPOINT
:annotation: = ['POST', 'api/2.0/jobs/runs/cancel']
.. data:: USER_AGENT_HEADER
.. py:class:: DatabricksHook(databricks_conn_id='databricks_default', timeout_seconds=180, retry_limit=3, retry_delay=1.0)
Bases: :class:`airflow.hooks.base_hook.BaseHook`
Interact with Databricks.
.. staticmethod:: _parse_host(host)
The purpose of this function is to be robust to improper connections
settings provided by users, specifically in the host field.
For example -- when users supply ``https://xx.cloud.databricks.com`` as the
host, we must strip out the protocol to get the host.::
h = DatabricksHook()
assert h._parse_host('https://xx.cloud.databricks.com') == 'xx.cloud.databricks.com'
In the case where users supply the correct ``xx.cloud.databricks.com`` as the
host, this function is a no-op.::
assert h._parse_host('xx.cloud.databricks.com') == 'xx.cloud.databricks.com'
.. method:: _do_api_call(self, endpoint_info, json)
Utility function to perform an API call with retries
:param endpoint_info: Tuple of method and endpoint
:type endpoint_info: tuple[string, string]
:param json: Parameters for this API call.
:type json: dict
:return: If the api call returns a OK status code,
this function returns the response in JSON. Otherwise,
we throw an AirflowException.
:rtype: dict
.. method:: _log_request_error(self, attempt_num, error)
.. method:: run_now(self, json)
Utility function to call the ``api/2.0/jobs/run-now`` endpoint.
:param json: The data used in the body of the request to the ``run-now`` endpoint.
:type json: dict
:return: the run_id as a string
:rtype: str
.. method:: submit_run(self, json)
Utility function to call the ``api/2.0/jobs/runs/submit`` endpoint.
:param json: The data used in the body of the request to the ``submit`` endpoint.
:type json: dict
:return: the run_id as a string
:rtype: str
.. method:: get_run_page_url(self, run_id)
.. method:: get_run_state(self, run_id)
.. method:: cancel_run(self, run_id)
.. method:: restart_cluster(self, json)
.. method:: start_cluster(self, json)
.. method:: terminate_cluster(self, json)
.. function:: _retryable_error(exception)
.. data:: RUN_LIFE_CYCLE_STATES
:annotation: = ['PENDING', 'RUNNING', 'TERMINATING', 'TERMINATED', 'SKIPPED', 'INTERNAL_ERROR']
.. py:class:: RunState(life_cycle_state, result_state, state_message)
Utility class for the run state concept of Databricks runs.
.. attribute:: is_terminal
.. attribute:: is_successful
.. method:: __eq__(self, other)
.. method:: __repr__(self)
.. py:class:: _TokenAuth(token)
Bases: :class:`requests.auth.AuthBase`
Helper class for requests Auth field. AuthBase requires you to implement the __call__
magic function.
.. method:: __call__(self, r)