blob: 7867fe901b394a15455cf9e7f32d062a16558c3f [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Kerberos
--------
Airflow has initial support for Kerberos. This means that Airflow can renew kerberos
tickets for itself and store it in the ticket cache. The hooks and dags can make use of ticket
to authenticate against kerberized services.
.. contents::
:depth: 1
:local:
Limitations
'''''''''''
Please note that at this time, not all hooks have been adjusted to make use of this functionality.
Also it does not integrate kerberos into the web interface and you will have to rely on network
level security for now to make sure your service remains secure.
Celery integration has not been tried and tested yet. However, if you generate a key tab for every
host and launch a ticket renewer next to every worker it will most likely work.
Enabling kerberos
'''''''''''''''''
Airflow
^^^^^^^
To enable kerberos you will need to generate a (service) key tab.
.. code-block:: bash
# in the kadmin.local or kadmin shell, create the airflow principal
kadmin: addprinc -randkey airflow/fully.qualified.domain.name@YOUR-REALM.COM
# Create the airflow keytab file that will contain the airflow principal
kadmin: xst -norandkey -k airflow.keytab airflow/fully.qualified.domain.name
Now store this file in a location where the airflow user can read it (chmod 600). And then add the following to
your ``airflow.cfg``
.. code-block:: ini
[core]
security = kerberos
[kerberos]
keytab = /etc/airflow/airflow.keytab
reinit_frequency = 3600
principal = airflow
Launch the ticket renewer by
.. code-block:: bash
# run ticket renewer
airflow kerberos
Hadoop
^^^^^^
If want to use impersonation this needs to be enabled in ``core-site.xml`` of your hadoop config.
.. code-block:: xml
<property>
<name>hadoop.proxyuser.airflow.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.airflow.users</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.airflow.hosts</name>
<value>*</value>
</property>
Of course if you need to tighten your security replace the asterisk with something more appropriate.
Using kerberos authentication
'''''''''''''''''''''''''''''
The hive hook has been updated to take advantage of kerberos authentication. To allow your DAGs to
use it, simply update the connection details with, for example:
.. code-block:: json
{ "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM"}
Adjust the principal to your settings. The ``_HOST`` part will be replaced by the fully qualified domain name of
the server.
You can specify if you would like to use the dag owner as the user for the connection or the user specified in the login
section of the connection. For the login user, specify the following as extra:
.. code-block:: json
{ "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM", "proxy_user": "login"}
For the DAG owner use:
.. code-block:: json
{ "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM", "proxy_user": "owner"}
and in your DAG, when initializing the HiveOperator, specify:
.. code-block:: bash
run_as_owner=True
To use kerberos authentication, you must install Airflow with the ``kerberos`` extras group:
.. code-block:: bash
pip install 'apache-airflow[kerberos]'
You can read about some production aspects of kerberos deployment at :ref:`production-deployment:kerberos`