| :py:mod:`airflow.providers.apache.pinot.hooks.pinot` |
| ==================================================== |
| |
| .. py:module:: airflow.providers.apache.pinot.hooks.pinot |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.apache.pinot.hooks.pinot.PinotAdminHook |
| airflow.providers.apache.pinot.hooks.pinot.PinotDbApiHook |
| |
| |
| |
| |
| .. py:class:: PinotAdminHook(conn_id = 'pinot_admin_default', cmd_path = 'pinot-admin.sh', pinot_admin_system_exit = False) |
| |
| Bases: :py:obj:`airflow.hooks.base.BaseHook` |
| |
| This hook is a wrapper around the pinot-admin.sh script. |
| For now, only small subset of its subcommands are implemented, |
| which are required to ingest offline data into Apache Pinot |
| (i.e., AddSchema, AddTable, CreateSegment, and UploadSegment). |
| Their command options are based on Pinot v0.1.0. |
| |
| Unfortunately, as of v0.1.0, pinot-admin.sh always exits with |
| status code 0. To address this behavior, users can use the |
| pinot_admin_system_exit flag. If its value is set to false, |
| this hook evaluates the result based on the output message |
| instead of the status code. This Pinot's behavior is supposed |
| to be improved in the next release, which will include the |
| following PR: https://github.com/apache/incubator-pinot/pull/4110 |
| |
| :param conn_id: The name of the connection to use. |
| :param cmd_path: The filepath to the pinot-admin.sh executable |
| :param pinot_admin_system_exit: If true, the result is evaluated based on the status code. |
| Otherwise, the result is evaluated as a failure if "Error" or |
| "Exception" is in the output message. |
| |
| .. py:method:: get_conn(self) |
| |
| Returns connection for the hook. |
| |
| |
| .. py:method:: add_schema(self, schema_file, with_exec = True) |
| |
| Add Pinot schema by run AddSchema command |
| |
| :param schema_file: Pinot schema file |
| :param with_exec: bool |
| |
| |
| .. py:method:: add_table(self, file_path, with_exec = True) |
| |
| Add Pinot table with run AddTable command |
| |
| :param file_path: Pinot table configure file |
| :param with_exec: bool |
| |
| |
| .. py:method:: create_segment(self, generator_config_file = None, data_dir = None, segment_format = None, out_dir = None, overwrite = None, table_name = None, segment_name = None, time_column_name = None, schema_file = None, reader_config_file = None, enable_star_tree_index = None, star_tree_index_spec_file = None, hll_size = None, hll_columns = None, hll_suffix = None, num_threads = None, post_creation_verification = None, retry = None) |
| |
| Create Pinot segment by run CreateSegment command |
| |
| |
| .. py:method:: upload_segment(self, segment_dir, table_name = None) |
| |
| Upload Segment with run UploadSegment command |
| |
| :param segment_dir: |
| :param table_name: |
| :return: |
| |
| |
| .. py:method:: run_cli(self, cmd, verbose = True) |
| |
| Run command with pinot-admin.sh |
| |
| :param cmd: List of command going to be run by pinot-admin.sh script |
| :param verbose: |
| |
| |
| |
| .. py:class:: PinotDbApiHook(*args, schema = None, **kwargs) |
| |
| Bases: :py:obj:`airflow.hooks.dbapi.DbApiHook` |
| |
| Interact with Pinot Broker Query API |
| |
| This hook uses standard-SQL endpoint since PQL endpoint is soon to be deprecated. |
| https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql |
| |
| .. py:attribute:: conn_name_attr |
| :annotation: = pinot_broker_conn_id |
| |
| |
| |
| .. py:attribute:: default_conn_name |
| :annotation: = pinot_broker_default |
| |
| |
| |
| .. py:attribute:: supports_autocommit |
| :annotation: = False |
| |
| |
| |
| .. py:method:: get_conn(self) |
| |
| Establish a connection to pinot broker through pinot dbapi. |
| |
| |
| .. py:method:: get_uri(self) |
| |
| Get the connection uri for pinot broker. |
| |
| e.g: http://localhost:9000/query/sql |
| |
| |
| .. py:method:: get_records(self, sql, parameters = None) |
| |
| Executes the sql and returns a set of records. |
| |
| :param sql: the sql statement to be executed (str) or a list of |
| sql statements to execute |
| :param parameters: The parameters to render the SQL query with. |
| |
| |
| .. py:method:: get_first(self, sql, parameters = None) |
| |
| Executes the sql and returns the first resulting row. |
| |
| :param sql: the sql statement to be executed (str) or a list of |
| sql statements to execute |
| :param parameters: The parameters to render the SQL query with. |
| |
| |
| .. py:method:: set_autocommit(self, conn, autocommit) |
| :abstractmethod: |
| |
| Sets the autocommit flag on the connection |
| |
| |
| .. py:method:: insert_rows(self, table, rows, target_fields = None, commit_every = 1000, replace = False, **kwargs) |
| :abstractmethod: |
| |
| A generic way to insert a set of tuples into a table, |
| a new transaction is created every commit_every rows |
| |
| :param table: Name of the target table |
| :param rows: The rows to insert into the table |
| :param target_fields: The names of the columns to fill in the table |
| :param commit_every: The maximum number of rows to insert in one |
| transaction. Set to 0 to insert all rows in one transaction. |
| :param replace: Whether to replace instead of insert |
| |
| |
| |