blob: 7937797e366b4bb52d08cb1c9123ce7efcb1a116 [file] [log] [blame]
:py:mod:`airflow.providers.google.cloud.utils.field_validator`
==============================================================
.. py:module:: airflow.providers.google.cloud.utils.field_validator
.. autoapi-nested-parse::
Validator for body fields sent via Google Cloud API.
The validator performs validation of the body (being dictionary of fields) that
is sent in the API request to Google Cloud (via ``googleclient`` API usually).
Context
-------
The specification mostly focuses on helping Airflow DAG developers in the development
phase. You can build your own Google Cloud operator (such as GcfDeployOperator for example) which
can have built-in validation specification for the particular API. It's super helpful
when developer plays with different fields and their values at the initial phase of
DAG development. Most of the Google Cloud APIs perform their own validation on the
server side, but most of the requests are asynchronous and you need to wait for result
of the operation. This takes precious times and slows
down iteration over the API. BodyFieldValidator is meant to be used on the client side
and it should therefore provide an instant feedback to the developer on misspelled or
wrong type of parameters.
The validation should be performed in "execute()" method call in order to allow
template parameters to be expanded before validation is performed.
Types of fields
---------------
Specification is an array of dictionaries - each dictionary describes field, its type,
validation, optionality, api_version supported and nested fields (for unions and dicts).
Typically (for clarity and in order to aid syntax highlighting) the array of
dicts should be defined as series of dict() executions. Fragment of example
specification might look as follows::
SPECIFICATION =[
dict(name="an_union", type="union", optional=True, fields=[
dict(name="variant_1", type="dict"),
dict(name="variant_2", regexp=r'^.+$', api_version='v1beta2'),
),
dict(name="an_union", type="dict", fields=[
dict(name="field_1", type="dict"),
dict(name="field_2", regexp=r'^.+$'),
),
...
]
Each field should have key = "name" indicating field name. The field can be of one of the
following types:
* Dict fields: (key = "type", value="dict"):
Field of this type should contain nested fields in form of an array of dicts.
Each of the fields in the array is then expected (unless marked as optional)
and validated recursively. If an extra field is present in the dictionary, warning is
printed in log file (but the validation succeeds - see the Forward-compatibility notes)
* List fields: (key = "type", value="list"):
Field of this type should be a list. Only the type correctness is validated.
The contents of a list are not subject to validation.
* Union fields (key = "type", value="union"): field of this type should contain nested
fields in form of an array of dicts. One of the fields (and only one) should be
present (unless the union is marked as optional). If more than one union field is
present, FieldValidationException is raised. If none of the union fields is
present - warning is printed in the log (see below Forward-compatibility notes).
* Fields validated for non-emptiness: (key = "allow_empty") - this applies only to
fields the value of which is a string, and it allows to check for non-emptiness of
the field (allow_empty=False).
* Regexp-validated fields: (key = "regexp") - fields of this type are assumed to be
strings and they are validated with the regexp specified. Remember that the regexps
should ideally contain ^ at the beginning and $ at the end to make sure that
the whole field content is validated. Typically such regexp
validations should be used carefully and sparingly (see Forward-compatibility
notes below).
* Custom-validated fields: (key = "custom_validation") - fields of this type are validated
using method specified via custom_validation field. Any exception thrown in the custom
validation will be turned into FieldValidationException and will cause validation to
fail. Such custom validations might be used to check numeric fields (including
ranges of values), booleans or any other types of fields.
* API version: (key="api_version") if API version is specified, then the field will only
be validated when api_version used at field validator initialization matches exactly the
version specified. If you want to declare fields that are available in several
versions of the APIs, you should specify the field as many times as many API versions
should be supported (each time with different API version).
* if none of the keys ("type", "regexp", "custom_validation" - the field is not validated
You can see some of the field examples in EXAMPLE_VALIDATION_SPECIFICATION.
Forward-compatibility notes
---------------------------
Certain decisions are crucial to allow the client APIs to work also with future API
versions. Since body attached is passed to the API's call, this is entirely
possible to pass-through any new fields in the body (for future API versions) -
albeit without validation on the client side - they can and will still be validated
on the server side usually.
Here are the guidelines that you should follow to make validation forward-compatible:
* most of the fields are not validated for their content. It's possible to use regexp
in some specific cases that are guaranteed not to change in the future, but for most
fields regexp validation should be r'^.+$' indicating check for non-emptiness
* api_version is not validated - user can pass any future version of the api here. The API
version is only used to filter parameters that are marked as present in this api version
any new (not present in the specification) fields in the body are allowed (not verified)
For dictionaries, new fields can be added to dictionaries by future calls. However if an
unknown field in dictionary is added, a warning is logged by the client (but validation
remains successful). This is very nice feature to protect against typos in names.
* For unions, newly added union variants can be added by future calls and they will
pass validation, however the content or presence of those fields will not be validated.
This means that it's possible to send a new non-validated union field together with an
old validated field and this problem will not be detected by the client. In such case
warning will be printed.
* When you add validator to an operator, you should also add ``validate_body`` parameter
(default = True) to __init__ of such operators - when it is set to False,
no validation should be performed. This is a safeguard for totally unpredicted and
backwards-incompatible changes that might sometimes occur in the APIs.
Module Contents
---------------
Classes
~~~~~~~
.. autoapisummary::
airflow.providers.google.cloud.utils.field_validator.GcpBodyFieldValidator
Attributes
~~~~~~~~~~
.. autoapisummary::
airflow.providers.google.cloud.utils.field_validator.COMPOSITE_FIELD_TYPES
airflow.providers.google.cloud.utils.field_validator.EXAMPLE_VALIDATION_SPECIFICATION
.. py:data:: COMPOSITE_FIELD_TYPES
:annotation: = ['union', 'dict', 'list']
.. py:exception:: GcpFieldValidationException
Bases: :py:obj:`airflow.exceptions.AirflowException`
Thrown when validation finds dictionary field not valid according to specification.
.. py:exception:: GcpValidationSpecificationException
Bases: :py:obj:`airflow.exceptions.AirflowException`
Thrown when validation specification is wrong.
This should only happen during development as ideally
specification itself should not be invalid ;) .
.. py:data:: EXAMPLE_VALIDATION_SPECIFICATION
.. py:class:: GcpBodyFieldValidator(validation_specs, api_version)
Bases: :py:obj:`airflow.utils.log.logging_mixin.LoggingMixin`
Validates correctness of request body according to specification.
The specification can describe various type of
fields including custom validation, and union of fields. This validator is
to be reusable by various operators. See the EXAMPLE_VALIDATION_SPECIFICATION
for some examples and explanations of how to create specification.
:param validation_specs: dictionary describing validation specification
:param api_version: Version of the api used (for example v1)
.. py:method:: validate(body_to_validate)
Validates if the body (dictionary) follows specification that the validator was
instantiated with. Raises ValidationSpecificationException or
ValidationFieldException in case of problems with specification or the
body not conforming to the specification respectively.
:param body_to_validate: body that must follow the specification
:return: None