| |
| |
| :mod:`airflow.contrib.operators.mysql_to_gcs` |
| ============================================= |
| |
| .. py:module:: airflow.contrib.operators.mysql_to_gcs |
| |
| |
| |
| |
| |
| |
| |
| Module Contents |
| --------------- |
| |
| |
| |
| |
| |
| |
| .. data:: PY3 |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| .. py:class:: MySqlToGoogleCloudStorageOperator(sql, bucket, filename, schema_filename=None, approx_max_file_size_bytes=1900000000, mysql_conn_id='mysql_default', google_cloud_storage_conn_id='google_cloud_default', schema=None, delegate_to=None, export_format='json', field_delimiter=',', *args, **kwargs) |
| |
| Bases::class:`airflow.models.BaseOperator` |
| |
| |
| |
| Copy data from MySQL to Google cloud storage in JSON or CSV format. |
| |
| The JSON data files generated are newline-delimited to enable them to be |
| loaded into BigQuery. |
| Reference: https://cloud.google.com/bigquery/docs/ |
| loading-data-cloud-storage-json#limitations |
| |
| :param sql: The SQL to execute on the MySQL table. |
| :type sql: str |
| :param bucket: The bucket to upload to. |
| :type bucket: str |
| :param filename: The filename to use as the object name when uploading |
| to Google cloud storage. A {} should be specified in the filename |
| to allow the operator to inject file numbers in cases where the |
| file is split due to size. |
| :type filename: str |
| :param schema_filename: If set, the filename to use as the object name |
| when uploading a .json file containing the BigQuery schema fields |
| for the table that was dumped from MySQL. |
| :type schema_filename: str |
| :param approx_max_file_size_bytes: This operator supports the ability |
| to split large table dumps into multiple files (see notes in the |
| filenamed param docs above). Google cloud storage allows for files |
| to be a maximum of 4GB. This param allows developers to specify the |
| file size of the splits. |
| :type approx_max_file_size_bytes: long |
| :param mysql_conn_id: Reference to a specific MySQL hook. |
| :type mysql_conn_id: str |
| :param google_cloud_storage_conn_id: Reference to a specific Google |
| cloud storage hook. |
| :type google_cloud_storage_conn_id: str |
| :param schema: The schema to use, if any. Should be a list of dict or |
| a str. Pass a string if using Jinja template, otherwise, pass a list of |
| dict. Examples could be seen: https://cloud.google.com/bigquery/docs |
| /schemas#specifying_a_json_schema_file |
| :type schema: str or list |
| :param delegate_to: The account to impersonate, if any. For this to |
| work, the service account making the request must have domain-wide |
| delegation enabled. |
| :type delegate_to: str |
| :param export_format: Desired format of files to be exported. |
| :type export_format: str |
| :param field_delimiter: The delimiter to be used for CSV files. |
| :type field_delimiter: str |
| |
| |
| |
| |
| |
| |
| .. attribute:: template_fields |
| :annotation: = ['sql', 'bucket', 'filename', 'schema_filename', 'schema'] |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| .. attribute:: template_ext |
| :annotation: = ['.sql'] |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| .. attribute:: ui_color |
| :annotation: = #a0e08c |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| .. method:: execute(self, context) |
| |
| |
| |
| |
| |
| |
| |
| .. method:: _query_mysql(self) |
| |
| |
| Queries mysql and returns a cursor to the results. |
| |
| |
| |
| |
| |
| |
| |
| .. method:: _write_local_data_files(self, cursor) |
| |
| |
| Takes a cursor, and writes results to a local file. |
| |
| :return: A dictionary where keys are filenames to be used as object |
| names in GCS, and values are file handles to local files that |
| contain the data for the GCS objects. |
| |
| |
| |
| |
| |
| |
| |
| .. method:: _configure_csv_file(self, file_handle, schema) |
| |
| |
| Configure a csv writer with the file_handle and write schema |
| as headers for the new file. |
| |
| |
| |
| |
| |
| |
| |
| .. method:: _write_local_schema_file(self, cursor) |
| |
| |
| Takes a cursor, and writes the BigQuery schema in .json format for the |
| results to a local file system. |
| |
| :return: A dictionary where key is a filename to be used as an object |
| name in GCS, and values are file handles to local files that |
| contains the BigQuery schema fields in .json format. |
| |
| |
| |
| |
| |
| |
| |
| .. method:: _upload_to_gcs(self, files_to_upload) |
| |
| |
| Upload all of the file splits (and optionally the schema .json file) to |
| Google cloud storage. |
| |
| |
| |
| |
| |
| |
| |
| .. staticmethod:: _convert_types(schema, col_type_dict, row) |
| |
| |
| Takes a value from MySQLdb, and converts it to a value that's safe for |
| JSON/Google cloud storage/BigQuery. Dates are converted to UTC seconds. |
| Decimals are converted to floats. Binary type fields are encoded with base64, |
| as imported BYTES data must be base64-encoded according to Bigquery SQL |
| date type documentation: https://cloud.google.com/bigquery/data-types |
| |
| |
| |
| |
| |
| |
| |
| .. method:: _get_col_type_dict(self) |
| |
| |
| Return a dict of column name and column type based on self.schema if not None. |
| |
| |
| |
| |
| |
| |
| |
| .. classmethod:: type_map(cls, mysql_type) |
| |
| |
| Helper function that maps from MySQL fields to BigQuery fields. Used |
| when a schema_filename is set. |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |