docs-archive/1.10.8/_sources/_api/airflow/contrib/hooks/bigquery_hook/index.rst.txt - airflow-site - Git at Google

 :mod:`airflow.contrib.hooks.bigquery_hook`
 ==========================================

 .. py:module:: airflow.contrib.hooks.bigquery_hook

 .. autoapi-nested-parse::

    This module contains a BigQuery Hook, as well as a very basic PEP 249
    implementation for BigQuery.


 Module Contents
 ---------------

 .. py:class:: BigQueryHook(bigquery_conn_id='bigquery_default', delegate_to=None, use_legacy_sql=True, location=None)

    Bases: :class:`airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook`, :class:`airflow.hooks.dbapi_hook.DbApiHook`

    Interact with BigQuery. This hook uses the Google Cloud Platform
    connection.

    .. attribute:: conn_name_attr
       :annotation: = bigquery_conn_id


    .. method:: get_conn(self)

       Returns a BigQuery PEP 249 connection object.


    .. method:: get_service(self)

       Returns a BigQuery service object.


    .. method:: insert_rows(self, table, rows, target_fields=None, commit_every=1000)

       Insertion is currently unsupported. Theoretically, you could use
       BigQuery's streaming API to insert rows into a table, but this hasn't
       been implemented.


    .. method:: get_pandas_df(self, sql, parameters=None, dialect=None)

       Returns a Pandas DataFrame for the results produced by a BigQuery
       query. The DbApiHook method must be overridden because Pandas
       doesn't support PEP 249 connections, except for SQLite. See:

       https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L447
       https://github.com/pydata/pandas/issues/6900

       :param sql: The BigQuery SQL to execute.
       :type sql: str
       :param parameters: The parameters to render the SQL query with (not
           used, leave to override superclass method)
       :type parameters: mapping or iterable
       :param dialect: Dialect of BigQuery SQL – legacy SQL or standard SQL
           defaults to use `self.use_legacy_sql` if not specified
       :type dialect: str in {'legacy', 'standard'}


    .. method:: table_exists(self, project_id, dataset_id, table_id)

       Checks for the existence of a table in Google BigQuery.

       :param project_id: The Google cloud project in which to look for the
           table. The connection supplied to the hook must provide access to
           the specified project.
       :type project_id: str
       :param dataset_id: The name of the dataset in which to look for the
           table.
       :type dataset_id: str
       :param table_id: The name of the table to check the existence of.
       :type table_id: str


 .. py:class:: BigQueryPandasConnector(project_id, service, reauth=False, verbose=False, dialect='legacy')

    Bases: :class:`pandas_gbq.gbq.GbqConnector`

    This connector behaves identically to GbqConnector (from Pandas), except
    that it allows the service to be injected, and disables a call to
    self.get_credentials(). This allows Airflow to use BigQuery with Pandas
    without forcing a three legged OAuth connection. Instead, we can inject
    service account credentials into the binding.


 .. py:class:: BigQueryConnection(*args, **kwargs)

    Bases: :class:`object`

    BigQuery does not have a notion of a persistent connection. Thus, these
    objects are small stateless factories for cursors, which do all the real
    work.


    .. method:: close(self)

       BigQueryConnection does not have anything to close.


    .. method:: commit(self)

       BigQueryConnection does not support transactions.


    .. method:: cursor(self)

       Return a new :py:class:`Cursor` object using the connection.


    .. method:: rollback(self)


 .. py:class:: BigQueryBaseCursor(service, project_id, use_legacy_sql=True, api_resource_configs=None, location=None, num_retries=5)

    Bases: :class:`airflow.utils.log.logging_mixin.LoggingMixin`

    The BigQuery base cursor contains helper methods to execute queries against
    BigQuery. The methods can be used directly by operators, in cases where a
    PEP 249 cursor isn't needed.


    .. method:: create_empty_table(self, project_id, dataset_id, table_id, schema_fields=None, time_partitioning=None, cluster_fields=None, labels=None, view=None, encryption_configuration=None, num_retries=5)

       Creates a new, empty table in the dataset.
       To create a view, which is defined by a SQL query, parse a dictionary to 'view' kwarg

       :param project_id: The project to create the table into.
       :type project_id: str
       :param dataset_id: The dataset to create the table into.
       :type dataset_id: str
       :param table_id: The Name of the table to be created.
       :type table_id: str
       :param schema_fields: If set, the schema field list as defined here:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema
       :type schema_fields: list
       :param labels: a dictionary containing labels for the table, passed to BigQuery
       :type labels: dict

       **Example**: ::

           schema_fields=[{"name": "emp_name", "type": "STRING", "mode": "REQUIRED"},
                          {"name": "salary", "type": "INTEGER", "mode": "NULLABLE"}]

       :param time_partitioning: configure optional time partitioning fields i.e.
           partition by field, type and expiration as per API specifications.

           .. seealso::
               https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#timePartitioning
       :type time_partitioning: dict
       :param cluster_fields: [Optional] The fields used for clustering.
           Must be specified with time_partitioning, data in the table will be first
           partitioned and subsequently clustered.
           https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#clustering.fields
       :type cluster_fields: list
       :param view: [Optional] A dictionary containing definition for the view.
           If set, it will create a view instead of a table:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#view
       :type view: dict

       **Example**: ::

           view = {
               "query": "SELECT * FROM `test-project-id.test_dataset_id.test_table_prefix*` LIMIT 1000",
               "useLegacySql": False
           }

       :param encryption_configuration: [Optional] Custom encryption configuration (e.g., Cloud KMS keys).
           **Example**: ::

               encryption_configuration = {
                   "kmsKeyName": "projects/testp/locations/us/keyRings/test-kr/cryptoKeys/test-key"
               }
       :type encryption_configuration: dict
       :return: None


    .. method:: create_external_table(self, external_project_dataset_table, schema_fields, source_uris, source_format='CSV', autodetect=False, compression='NONE', ignore_unknown_values=False, max_bad_records=0, skip_leading_rows=0, field_delimiter=',', quote_character=None, allow_quoted_newlines=False, allow_jagged_rows=False, encoding='UTF-8', src_fmt_configs=None, labels=None, encryption_configuration=None)

       Creates a new external table in the dataset with the data in Google
       Cloud Storage. See here:

       https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource

       for more details about these parameters.

       :param external_project_dataset_table:
           The dotted ``(<project>.|<project>:)<dataset>.<table>($<partition>)`` BigQuery
           table name to create external table.
           If ``<project>`` is not included, project will be the
           project defined in the connection json.
       :type external_project_dataset_table: str
       :param schema_fields: The schema field list as defined here:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource
       :type schema_fields: list
       :param source_uris: The source Google Cloud
           Storage URI (e.g. gs://some-bucket/some-file.txt). A single wild
           per-object name can be used.
       :type source_uris: list
       :param source_format: File format to export.
       :type source_format: str
       :param autodetect: Try to detect schema and format options automatically.
           Any option specified explicitly will be honored.
       :type autodetect: bool
       :param compression: [Optional] The compression type of the data source.
           Possible values include GZIP and NONE.
           The default value is NONE.
           This setting is ignored for Google Cloud Bigtable,
           Google Cloud Datastore backups and Avro formats.
       :type compression: str
       :param ignore_unknown_values: [Optional] Indicates if BigQuery should allow
           extra values that are not represented in the table schema.
           If true, the extra values are ignored. If false, records with extra columns
           are treated as bad records, and if there are too many bad records, an
           invalid error is returned in the job result.
       :type ignore_unknown_values: bool
       :param max_bad_records: The maximum number of bad records that BigQuery can
           ignore when running the job.
       :type max_bad_records: int
       :param skip_leading_rows: Number of rows to skip when loading from a CSV.
       :type skip_leading_rows: int
       :param field_delimiter: The delimiter to use when loading from a CSV.
       :type field_delimiter: str
       :param quote_character: The value that is used to quote data sections in a CSV
           file.
       :type quote_character: str
       :param allow_quoted_newlines: Whether to allow quoted newlines (true) or not
           (false).
       :type allow_quoted_newlines: bool
       :param allow_jagged_rows: Accept rows that are missing trailing optional columns.
           The missing values are treated as nulls. If false, records with missing
           trailing columns are treated as bad records, and if there are too many bad
           records, an invalid error is returned in the job result. Only applicable when
           soure_format is CSV.
       :type allow_jagged_rows: bool
       :param encoding: The character encoding of the data. See:

           .. seealso::
               https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#externalDataConfiguration.csvOptions.encoding
       :type encoding: str
       :param src_fmt_configs: configure optional fields specific to the source format
       :type src_fmt_configs: dict
       :param labels: a dictionary containing labels for the table, passed to BigQuery
       :type labels: dict
       :param encryption_configuration: [Optional] Custom encryption configuration (e.g., Cloud KMS keys).
           **Example**: ::

               encryption_configuration = {
                   "kmsKeyName": "projects/testp/locations/us/keyRings/test-kr/cryptoKeys/test-key"
               }
       :type encryption_configuration: dict


    .. method:: patch_table(self, dataset_id, table_id, project_id=None, description=None, expiration_time=None, external_data_configuration=None, friendly_name=None, labels=None, schema=None, time_partitioning=None, view=None, require_partition_filter=None, encryption_configuration=None)

       Patch information in an existing table.
       It only updates fileds that are provided in the request object.

       Reference: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/patch

       :param dataset_id: The dataset containing the table to be patched.
       :type dataset_id: str
       :param table_id: The Name of the table to be patched.
       :type table_id: str
       :param project_id: The project containing the table to be patched.
       :type project_id: str
       :param description: [Optional] A user-friendly description of this table.
       :type description: str
       :param expiration_time: [Optional] The time when this table expires,
           in milliseconds since the epoch.
       :type expiration_time: int
       :param external_data_configuration: [Optional] A dictionary containing
           properties of a table stored outside of BigQuery.
       :type external_data_configuration: dict
       :param friendly_name: [Optional] A descriptive name for this table.
       :type friendly_name: str
       :param labels: [Optional] A dictionary containing labels associated with this table.
       :type labels: dict
       :param schema: [Optional] If set, the schema field list as defined here:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.schema
           The supported schema modifications and unsupported schema modification are listed here:
           https://cloud.google.com/bigquery/docs/managing-table-schemas
           **Example**: ::

               schema=[{"name": "emp_name", "type": "STRING", "mode": "REQUIRED"},
                              {"name": "salary", "type": "INTEGER", "mode": "NULLABLE"}]

       :type schema: list
       :param time_partitioning: [Optional] A dictionary containing time-based partitioning
            definition for the table.
       :type time_partitioning: dict
       :param view: [Optional] A dictionary containing definition for the view.
           If set, it will patch a view instead of a table:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#view
           **Example**: ::

               view = {
                   "query": "SELECT * FROM `test-project-id.test_dataset_id.test_table_prefix*` LIMIT 500",
                   "useLegacySql": False
               }

       :type view: dict
       :param require_partition_filter: [Optional] If true, queries over the this table require a
           partition filter. If false, queries over the table
       :type require_partition_filter: bool
       :param encryption_configuration: [Optional] Custom encryption configuration (e.g., Cloud KMS keys).
           **Example**: ::

               encryption_configuration = {
                   "kmsKeyName": "projects/testp/locations/us/keyRings/test-kr/cryptoKeys/test-key"
               }
       :type encryption_configuration: dict


    .. method:: run_query(self, bql=None, sql=None, destination_dataset_table=None, write_disposition='WRITE_EMPTY', allow_large_results=False, flatten_results=None, udf_config=None, use_legacy_sql=None, maximum_billing_tier=None, maximum_bytes_billed=None, create_disposition='CREATE_IF_NEEDED', query_params=None, labels=None, schema_update_options=None, priority='INTERACTIVE', time_partitioning=None, api_resource_configs=None, cluster_fields=None, location=None, encryption_configuration=None)

       Executes a BigQuery SQL query. Optionally persists results in a BigQuery
       table. See here:

       https://cloud.google.com/bigquery/docs/reference/v2/jobs

       For more details about these parameters.

       :param bql: (Deprecated. Use `sql` parameter instead) The BigQuery SQL
           to execute.
       :type bql: str
       :param sql: The BigQuery SQL to execute.
       :type sql: str
       :param destination_dataset_table: The dotted ``<dataset>.<table>``
           BigQuery table to save the query results.
       :type destination_dataset_table: str
       :param write_disposition: What to do if the table already exists in
           BigQuery.
       :type write_disposition: str
       :param allow_large_results: Whether to allow large results.
       :type allow_large_results: bool
       :param flatten_results: If true and query uses legacy SQL dialect, flattens
           all nested and repeated fields in the query results. ``allowLargeResults``
           must be true if this is set to false. For standard SQL queries, this
           flag is ignored and results are never flattened.
       :type flatten_results: bool
       :param udf_config: The User Defined Function configuration for the query.
           See https://cloud.google.com/bigquery/user-defined-functions for details.
       :type udf_config: list
       :param use_legacy_sql: Whether to use legacy SQL (true) or standard SQL (false).
           If `None`, defaults to `self.use_legacy_sql`.
       :type use_legacy_sql: bool
       :param api_resource_configs: a dictionary that contain params
           'configuration' applied for Google BigQuery Jobs API:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs
           for example, {'query': {'useQueryCache': False}}. You could use it
           if you need to provide some params that are not supported by the
           BigQueryHook like args.
       :type api_resource_configs: dict
       :param maximum_billing_tier: Positive integer that serves as a
           multiplier of the basic price.
       :type maximum_billing_tier: int
       :param maximum_bytes_billed: Limits the bytes billed for this job.
           Queries that will have bytes billed beyond this limit will fail
           (without incurring a charge). If unspecified, this will be
           set to your project default.
       :type maximum_bytes_billed: float
       :param create_disposition: Specifies whether the job is allowed to
           create new tables.
       :type create_disposition: str
       :param query_params: a list of dictionary containing query parameter types and
           values, passed to BigQuery
       :type query_params: list
       :param labels: a dictionary containing labels for the job/query,
           passed to BigQuery
       :type labels: dict
       :param schema_update_options: Allows the schema of the destination
           table to be updated as a side effect of the query job.
       :type schema_update_options: Union[list, tuple, set]
       :param priority: Specifies a priority for the query.
           Possible values include INTERACTIVE and BATCH.
           The default value is INTERACTIVE.
       :type priority: str
       :param time_partitioning: configure optional time partitioning fields i.e.
           partition by field, type and expiration as per API specifications.
       :type time_partitioning: dict
       :param cluster_fields: Request that the result of this query be stored sorted
           by one or more columns. This is only available in combination with
           time_partitioning. The order of columns given determines the sort order.
       :type cluster_fields: list[str]
       :param location: The geographic location of the job. Required except for
           US and EU. See details at
           https://cloud.google.com/bigquery/docs/locations#specifying_your_location
       :type location: str
       :param encryption_configuration: [Optional] Custom encryption configuration (e.g., Cloud KMS keys).
           **Example**: ::

               encryption_configuration = {
                   "kmsKeyName": "projects/testp/locations/us/keyRings/test-kr/cryptoKeys/test-key"
               }
       :type encryption_configuration: dict


    .. method:: run_extract(self, source_project_dataset_table, destination_cloud_storage_uris, compression='NONE', export_format='CSV', field_delimiter=',', print_header=True, labels=None)

       Executes a BigQuery extract command to copy data from BigQuery to
       Google Cloud Storage. See here:

       https://cloud.google.com/bigquery/docs/reference/v2/jobs

       For more details about these parameters.

       :param source_project_dataset_table: The dotted ``<dataset>.<table>``
           BigQuery table to use as the source data.
       :type source_project_dataset_table: str
       :param destination_cloud_storage_uris: The destination Google Cloud
           Storage URI (e.g. gs://some-bucket/some-file.txt). Follows
           convention defined here:
           https://cloud.google.com/bigquery/exporting-data-from-bigquery#exportingmultiple
       :type destination_cloud_storage_uris: list
       :param compression: Type of compression to use.
       :type compression: str
       :param export_format: File format to export.
       :type export_format: str
       :param field_delimiter: The delimiter to use when extracting to a CSV.
       :type field_delimiter: str
       :param print_header: Whether to print a header for a CSV file extract.
       :type print_header: bool
       :param labels: a dictionary containing labels for the job/query,
           passed to BigQuery
       :type labels: dict


    .. method:: run_copy(self, source_project_dataset_tables, destination_project_dataset_table, write_disposition='WRITE_EMPTY', create_disposition='CREATE_IF_NEEDED', labels=None, encryption_configuration=None)

       Executes a BigQuery copy command to copy data from one BigQuery table
       to another. See here:

       https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.copy

       For more details about these parameters.

       :param source_project_dataset_tables: One or more dotted
           ``(project:|project.)<dataset>.<table>``
           BigQuery tables to use as the source data. Use a list if there are
           multiple source tables.
           If ``<project>`` is not included, project will be the project defined
           in the connection json.
       :type source_project_dataset_tables: list|string
       :param destination_project_dataset_table: The destination BigQuery
           table. Format is: ``(project:|project.)<dataset>.<table>``
       :type destination_project_dataset_table: str
       :param write_disposition: The write disposition if the table already exists.
       :type write_disposition: str
       :param create_disposition: The create disposition if the table doesn't exist.
       :type create_disposition: str
       :param labels: a dictionary containing labels for the job/query,
           passed to BigQuery
       :type labels: dict
       :param encryption_configuration: [Optional] Custom encryption configuration (e.g., Cloud KMS keys).
           **Example**: ::

               encryption_configuration = {
                   "kmsKeyName": "projects/testp/locations/us/keyRings/test-kr/cryptoKeys/test-key"
               }
       :type encryption_configuration: dict


    .. method:: run_load(self, destination_project_dataset_table, source_uris, schema_fields=None, source_format='CSV', create_disposition='CREATE_IF_NEEDED', skip_leading_rows=0, write_disposition='WRITE_EMPTY', field_delimiter=',', max_bad_records=0, quote_character=None, ignore_unknown_values=False, allow_quoted_newlines=False, allow_jagged_rows=False, encoding='UTF-8', schema_update_options=None, src_fmt_configs=None, time_partitioning=None, cluster_fields=None, autodetect=False, encryption_configuration=None)

       Executes a BigQuery load command to load data from Google Cloud Storage
       to BigQuery. See here:

       https://cloud.google.com/bigquery/docs/reference/v2/jobs

       For more details about these parameters.

       :param destination_project_dataset_table:
           The dotted ``(<project>.|<project>:)<dataset>.<table>($<partition>)`` BigQuery
           table to load data into. If ``<project>`` is not included, project will be the
           project defined in the connection json. If a partition is specified the
           operator will automatically append the data, create a new partition or create
           a new DAY partitioned table.
       :type destination_project_dataset_table: str
       :param schema_fields: The schema field list as defined here:
           https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load
           Required if autodetect=False; optional if autodetect=True.
       :type schema_fields: list
       :param autodetect: Attempt to autodetect the schema for CSV and JSON
           source files.
       :type autodetect: bool
       :param source_uris: The source Google Cloud
           Storage URI (e.g. gs://some-bucket/some-file.txt). A single wild
           per-object name can be used.
       :type source_uris: list
       :param source_format: File format to export.
       :type source_format: str
       :param create_disposition: The create disposition if the table doesn't exist.
       :type create_disposition: str
       :param skip_leading_rows: Number of rows to skip when loading from a CSV.
       :type skip_leading_rows: int
       :param write_disposition: The write disposition if the table already exists.
       :type write_disposition: str
       :param field_delimiter: The delimiter to use when loading from a CSV.
       :type field_delimiter: str
       :param max_bad_records: The maximum number of bad records that BigQuery can
           ignore when running the job.
       :type max_bad_records: int
       :param quote_character: The value that is used to quote data sections in a CSV
           file.
       :type quote_character: str
       :param ignore_unknown_values: [Optional] Indicates if BigQuery should allow
           extra values that are not represented in the table schema.
           If true, the extra values are ignored. If false, records with extra columns
           are treated as bad records, and if there are too many bad records, an
           invalid error is returned in the job result.
       :type ignore_unknown_values: bool
       :param allow_quoted_newlines: Whether to allow quoted newlines (true) or not
           (false).
       :type allow_quoted_newlines: bool
       :param allow_jagged_rows: Accept rows that are missing trailing optional columns.
           The missing values are treated as nulls. If false, records with missing
           trailing columns are treated as bad records, and if there are too many bad
           records, an invalid error is returned in the job result. Only applicable when
           soure_format is CSV.
       :type allow_jagged_rows: bool
       :param encoding: The character encoding of the data.

           .. seealso::
               https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#externalDataConfiguration.csvOptions.encoding
       :type encoding: str
       :param schema_update_options: Allows the schema of the destination
           table to be updated as a side effect of the load job.
       :type schema_update_options: Union[list, tuple, set]
       :param src_fmt_configs: configure optional fields specific to the source format
       :type src_fmt_configs: dict
       :param time_partitioning: configure optional time partitioning fields i.e.
           partition by field, type and  expiration as per API specifications.
       :type time_partitioning: dict
       :param cluster_fields: Request that the result of this load be stored sorted
           by one or more columns. This is only available in combination with
           time_partitioning. The order of columns given determines the sort order.
       :type cluster_fields: list[str]
       :param encryption_configuration: [Optional] Custom encryption configuration (e.g., Cloud KMS keys).
           **Example**: ::

               encryption_configuration = {
                   "kmsKeyName": "projects/testp/locations/us/keyRings/test-kr/cryptoKeys/test-key"
               }
       :type encryption_configuration: dict


    .. method:: run_with_configuration(self, configuration)

       Executes a BigQuery SQL query. See here:

       https://cloud.google.com/bigquery/docs/reference/v2/jobs

       For more details about the configuration parameter.

       :param configuration: The configuration parameter maps directly to
           BigQuery's configuration field in the job object. See
           https://cloud.google.com/bigquery/docs/reference/v2/jobs for
           details.


    .. method:: poll_job_complete(self, job_id)


    .. method:: cancel_query(self)

       Cancel all started queries that have not yet completed


    .. method:: get_schema(self, dataset_id, table_id)

       Get the schema for a given datset.table.
       see https://cloud.google.com/bigquery/docs/reference/v2/tables#resource

       :param dataset_id: the dataset ID of the requested table
       :param table_id: the table ID of the requested table
       :return: a table schema


    .. method:: get_tabledata(self, dataset_id, table_id, max_results=None, selected_fields=None, page_token=None, start_index=None)

       Get the data of a given dataset.table and optionally with selected columns.
       see https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list

       :param dataset_id: the dataset ID of the requested table.
       :param table_id: the table ID of the requested table.
       :param max_results: the maximum results to return.
       :param selected_fields: List of fields to return (comma-separated). If
           unspecified, all fields are returned.
       :param page_token: page token, returned from a previous call,
           identifying the result set.
       :param start_index: zero based index of the starting row to read.
       :return: map containing the requested rows.


    .. method:: run_table_delete(self, deletion_dataset_table, ignore_if_missing=False)

       Delete an existing table from the dataset;
       If the table does not exist, return an error unless ignore_if_missing
       is set to True.

       :param deletion_dataset_table: A dotted
           ``(<project>.|<project>:)<dataset>.<table>`` that indicates which table
           will be deleted.
       :type deletion_dataset_table: str
       :param ignore_if_missing: if True, then return success even if the
           requested table does not exist.
       :type ignore_if_missing: bool
       :return:


    .. method:: run_table_upsert(self, dataset_id, table_resource, project_id=None)

       creates a new, empty table in the dataset;
       If the table already exists, update the existing table.
       Since BigQuery does not natively allow table upserts, this is not an
       atomic operation.

       :param dataset_id: the dataset to upsert the table into.
       :type dataset_id: str
       :param table_resource: a table resource. see
           https://cloud.google.com/bigquery/docs/reference/v2/tables#resource
       :type table_resource: dict
       :param project_id: the project to upsert the table into.  If None,
           project will be self.project_id.
       :return:


    .. method:: run_grant_dataset_view_access(self, source_dataset, view_dataset, view_table, source_project=None, view_project=None)

       Grant authorized view access of a dataset to a view table.
       If this view has already been granted access to the dataset, do nothing.
       This method is not atomic.  Running it may clobber a simultaneous update.

       :param source_dataset: the source dataset
       :type source_dataset: str
       :param view_dataset: the dataset that the view is in
       :type view_dataset: str
       :param view_table: the table of the view
       :type view_table: str
       :param source_project: the project of the source dataset. If None,
           self.project_id will be used.
       :type source_project: str
       :param view_project: the project that the view is in. If None,
           self.project_id will be used.
       :type view_project: str
       :return: the datasets resource of the source dataset.


    .. method:: create_empty_dataset(self, dataset_id='', project_id='', dataset_reference=None)

       Create a new empty dataset:
       https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/insert

       :param project_id: The name of the project where we want to create
           an empty a dataset. Don't need to provide, if projectId in dataset_reference.
       :type project_id: str
       :param dataset_id: The id of dataset. Don't need to provide,
           if datasetId in dataset_reference.
       :type dataset_id: str
       :param dataset_reference: Dataset reference that could be provided
           with request body. More info:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource
       :type dataset_reference: dict


    .. method:: delete_dataset(self, project_id, dataset_id, delete_contents=False)

       Delete a dataset of Big query in your project.

       :param project_id: The name of the project where we have the dataset .
       :type project_id: str
       :param dataset_id: The dataset to be delete.
       :type dataset_id: str
       :param delete_contents: [Optional] Whether to force the deletion even if the dataset is not empty.
           Will delete all tables (if any) in the dataset if set to True.
           Will raise HttpError 400: "{dataset_id} is still in use" if set to False and dataset is not empty.
           The default value is False.
       :type delete_contents: bool
       :return:


    .. method:: get_dataset(self, dataset_id, project_id=None)

       Method returns dataset_resource if dataset exist
       and raised 404 error if dataset does not exist

       :param dataset_id: The BigQuery Dataset ID
       :type dataset_id: str
       :param project_id: The GCP Project ID
       :type project_id: str
       :return: dataset_resource

           .. seealso::
               For more information, see Dataset Resource content:
               https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource


    .. method:: get_datasets_list(self, project_id=None)

       Method returns full list of BigQuery datasets in the current project

       .. seealso::
           For more information, see:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list

       :param project_id: Google Cloud Project for which you
           try to get all datasets
       :type project_id: str
       :return: datasets_list

           Example of returned datasets_list: ::

                  {
                     "kind":"bigquery#dataset",
                     "location":"US",
                     "id":"your-project:dataset_2_test",
                     "datasetReference":{
                        "projectId":"your-project",
                        "datasetId":"dataset_2_test"
                     }
                  },
                  {
                     "kind":"bigquery#dataset",
                     "location":"US",
                     "id":"your-project:dataset_1_test",
                     "datasetReference":{
                        "projectId":"your-project",
                        "datasetId":"dataset_1_test"
                     }
                  }
               ]


    .. method:: patch_dataset(self, dataset_id, dataset_resource, project_id=None)

       Patches information in an existing dataset.
       It only replaces fields that are provided in the submitted dataset resource.
       More info:
       https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/patch

       :param dataset_id: The BigQuery Dataset ID
       :type dataset_id: str
       :param dataset_resource: Dataset resource that will be provided
           in request body.
           https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource
       :type dataset_resource: dict
       :param project_id: The GCP Project ID
       :type project_id: str
       :rtype: dataset
           https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource


    .. method:: update_dataset(self, dataset_id, dataset_resource, project_id=None)

       Updates information in an existing dataset. The update method replaces the entire
       dataset resource, whereas the patch method only replaces fields that are provided
       in the submitted dataset resource.
       More info:
       https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update

       :param dataset_id: The BigQuery Dataset ID
       :type dataset_id: str
       :param dataset_resource: Dataset resource that will be provided
           in request body.
           https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource
       :type dataset_resource: dict
       :param project_id: The GCP Project ID
       :type project_id: str
       :rtype: dataset
           https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource


    .. method:: insert_all(self, project_id, dataset_id, table_id, rows, ignore_unknown_values=False, skip_invalid_rows=False, fail_on_error=False)

       Method to stream data into BigQuery one record at a time without needing
       to run a load job

       .. seealso::
           For more information, see:
           https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll

       :param project_id: The name of the project where we have the table
       :type project_id: str
       :param dataset_id: The name of the dataset where we have the table
       :type dataset_id: str
       :param table_id: The name of the table
       :type table_id: str
       :param rows: the rows to insert
       :type rows: list

       **Example or rows**:
           rows=[{"json": {"a_key": "a_value_0"}}, {"json": {"a_key": "a_value_1"}}]

       :param ignore_unknown_values: [Optional] Accept rows that contain values
           that do not match the schema. The unknown values are ignored.
           The default value  is false, which treats unknown values as errors.
       :type ignore_unknown_values: bool
       :param skip_invalid_rows: [Optional] Insert all valid rows of a request,
           even if invalid rows exist. The default value is false, which causes
           the entire request to fail if any invalid rows exist.
       :type skip_invalid_rows: bool
       :param fail_on_error: [Optional] Force the task to fail if any errors occur.
           The default value is false, which indicates the task should not fail
           even if any insertion errors occur.
       :type fail_on_error: bool


 .. py:class:: BigQueryCursor(service, project_id, use_legacy_sql=True, location=None, num_retries=5)

    Bases: :class:`airflow.contrib.hooks.bigquery_hook.BigQueryBaseCursor`

    A very basic BigQuery PEP 249 cursor implementation. The PyHive PEP 249
    implementation was used as a reference:

    https://github.com/dropbox/PyHive/blob/master/pyhive/presto.py
    https://github.com/dropbox/PyHive/blob/master/pyhive/common.py

    .. attribute:: description


       The schema description method is not currently implemented.


    .. attribute:: rowcount


       By default, return -1 to indicate that this is not supported.


    .. attribute:: arraysize


    .. method:: close(self)

       By default, do nothing


    .. method:: execute(self, operation, parameters=None)

       Executes a BigQuery query, and returns the job ID.

       :param operation: The query to execute.
       :type operation: str
       :param parameters: Parameters to substitute into the query.
       :type parameters: dict


    .. method:: executemany(self, operation, seq_of_parameters)

       Execute a BigQuery query multiple times with different parameters.

       :param operation: The query to execute.
       :type operation: str
       :param seq_of_parameters: List of dictionary parameters to substitute into the
           query.
       :type seq_of_parameters: list


    .. method:: flush_results(self)

       Flush results related cursor attributes.


    .. method:: fetchone(self)

       Fetch the next row of a query result set.


    .. method:: next(self)

       Helper method for fetchone, which returns the next row from a buffer.
       If the buffer is empty, attempts to paginate through the result set for
       the next page, and load it into the buffer.


    .. method:: fetchmany(self, size=None)

       Fetch the next set of rows of a query result, returning a sequence of sequences
       (e.g. a list of tuples). An empty sequence is returned when no more rows are
       available. The number of rows to fetch per call is specified by the parameter.
       If it is not given, the cursor's arraysize determines the number of rows to be
       fetched. The method should try to fetch as many rows as indicated by the size
       parameter. If this is not possible due to the specified number of rows not being
       available, fewer rows may be returned. An :py:class:`~pyhive.exc.Error`
       (or subclass) exception is raised if the previous call to
       :py:meth:`execute` did not produce any result set or no call was issued yet.


    .. method:: fetchall(self)

       Fetch all (remaining) rows of a query result, returning them as a sequence of
       sequences (e.g. a list of tuples).


    .. method:: get_arraysize(self)

       Specifies the number of rows to fetch at a time with .fetchmany()


    .. method:: set_arraysize(self, arraysize)

       Specifies the number of rows to fetch at a time with .fetchmany()


    .. method:: setinputsizes(self, sizes)

       Does nothing by default


    .. method:: setoutputsize(self, size, column=None)

       Does nothing by default


 .. function:: _bind_parameters(operation, parameters)
    Helper method that binds parameters to a SQL query.


 .. function:: _escape(s)
    Helper method that escapes parameters to a SQL query.


 .. function:: _bq_cast(string_field, bq_type)
    Helper method that casts a BigQuery row to the appropriate data types.
    This is useful because BigQuery returns all fields as strings.


 .. function:: _split_tablename(table_input, default_project_id, var_name=None)

 .. function:: _cleanse_time_partitioning(destination_dataset_table, time_partitioning_in)

 .. function:: _validate_value(key, value, expected_type)
    function to check expected type and raise
    error if type is not correct


 .. function:: _api_resource_configs_duplication_check(key, value, config_dict, config_dict_name='api_resource_configs')

 .. function:: _validate_src_fmt_configs(source_format, src_fmt_configs, valid_configs, backward_compatibility_configs=None)
    Validates the given src_fmt_configs against a valid configuration for the source format.
    Adds the backward compatiblity config to the src_fmt_configs.

    :param source_format: File format to export.
    :type source_format: str
    :param src_fmt_configs: Configure optional fields specific to the source format.
    :type src_fmt_configs: dict
    :param valid_configs: Valid configuration specific to the source format
    :type valid_configs: List[str]
    :param backward_compatibility_configs: The top-level params for backward-compatibility
    :type backward_compatibility_configs: dict