docs-archive/apache-airflow-providers-apache-sqoop/3.0.0/_sources/_api/airflow/providers/apache/sqoop/hooks/sqoop/index.rst.txt - airflow-site - Git at Google

 :py:mod:`airflow.providers.apache.sqoop.hooks.sqoop`
 ====================================================

 .. py:module:: airflow.providers.apache.sqoop.hooks.sqoop

 .. autoapi-nested-parse::

    This module contains a sqoop 1.x hook


 Module Contents
 ---------------

 Classes
 ~~~~~~~

 .. autoapisummary::

    airflow.providers.apache.sqoop.hooks.sqoop.SqoopHook


 .. py:class:: SqoopHook(conn_id = default_conn_name, verbose = False, num_mappers = None, hcatalog_database = None, hcatalog_table = None, properties = None)

    Bases: :py:obj:`airflow.hooks.base.BaseHook`

    This hook is a wrapper around the sqoop 1 binary. To be able to use the hook
    it is required that "sqoop" is in the PATH.

    Additional arguments that can be passed via the 'extra' JSON field of the
    sqoop connection:

        * ``job_tracker``: Job tracker local|jobtracker:port.
        * ``namenode``: Namenode.
        * ``lib_jars``: Comma separated jar files to include in the classpath.
        * ``files``: Comma separated files to be copied to the map reduce cluster.
        * ``archives``: Comma separated archives to be unarchived on the compute
            machines.
        * ``password_file``: Path to file containing the password.

    :param conn_id: Reference to the sqoop connection.
    :param verbose: Set sqoop to verbose.
    :param num_mappers: Number of map tasks to import in parallel.
    :param properties: Properties to set via the -D argument

    .. py:attribute:: conn_name_attr
       :annotation: = conn_id


    .. py:attribute:: default_conn_name
       :annotation: = sqoop_default


    .. py:attribute:: conn_type
       :annotation: = sqoop


    .. py:attribute:: hook_name
       :annotation: = Sqoop


    .. py:method:: get_conn(self)

       Returns connection for the hook.


    .. py:method:: cmd_mask_password(self, cmd_orig)

       Mask command password for safety


    .. py:method:: popen(self, cmd, **kwargs)

       Remote Popen

       :param cmd: command to remotely execute
       :param kwargs: extra arguments to Popen (see subprocess.Popen)
       :return: handle to subprocess


    .. py:method:: import_table(self, table, target_dir = None, append = False, file_type = 'text', columns = None, split_by = None, where = None, direct = False, driver = None, extra_import_options = None, schema = None)

       Imports table from remote location to target dir. Arguments are
       copies of direct sqoop command line arguments

       :param table: Table to read
       :param schema: Schema name
       :param target_dir: HDFS destination dir
       :param append: Append data to an existing dataset in HDFS
       :param file_type: "avro", "sequence", "text" or "parquet".
           Imports data to into the specified format. Defaults to text.
       :param columns: <col,col,col…> Columns to import from table
       :param split_by: Column of the table used to split work units
       :param where: WHERE clause to use during import
       :param direct: Use direct connector if exists for the database
       :param driver: Manually specify JDBC driver class to use
       :param extra_import_options: Extra import options to pass as dict.
           If a key doesn't have a value, just pass an empty string to it.
           Don't include prefix of -- for sqoop options.


    .. py:method:: import_query(self, query, target_dir = None, append = False, file_type = 'text', split_by = None, direct = None, driver = None, extra_import_options = None)

       Imports a specific query from the rdbms to hdfs

       :param query: Free format query to run
       :param target_dir: HDFS destination dir
       :param append: Append data to an existing dataset in HDFS
       :param file_type: "avro", "sequence", "text" or "parquet"
           Imports data to hdfs into the specified format. Defaults to text.
       :param split_by: Column of the table used to split work units
       :param direct: Use direct import fast path
       :param driver: Manually specify JDBC driver class to use
       :param extra_import_options: Extra import options to pass as dict.
           If a key doesn't have a value, just pass an empty string to it.
           Don't include prefix of -- for sqoop options.


    .. py:method:: export_table(self, table, export_dir = None, input_null_string = None, input_null_non_string = None, staging_table = None, clear_staging_table = False, enclosed_by = None, escaped_by = None, input_fields_terminated_by = None, input_lines_terminated_by = None, input_optionally_enclosed_by = None, batch = False, relaxed_isolation = False, extra_export_options = None, schema = None)

       Exports Hive table to remote location. Arguments are copies of direct
       sqoop command line Arguments

       :param table: Table remote destination
       :param schema: Schema name
       :param export_dir: Hive table to export
       :param input_null_string: The string to be interpreted as null for
           string columns
       :param input_null_non_string: The string to be interpreted as null
           for non-string columns
       :param staging_table: The table in which data will be staged before
           being inserted into the destination table
       :param clear_staging_table: Indicate that any data present in the
           staging table can be deleted
       :param enclosed_by: Sets a required field enclosing character
       :param escaped_by: Sets the escape character
       :param input_fields_terminated_by: Sets the field separator character
       :param input_lines_terminated_by: Sets the end-of-line character
       :param input_optionally_enclosed_by: Sets a field enclosing character
       :param batch: Use batch mode for underlying statement execution
       :param relaxed_isolation: Transaction isolation to read uncommitted
           for the mappers
       :param extra_export_options: Extra export options to pass as dict.
           If a key doesn't have a value, just pass an empty string to it.
           Don't include prefix of -- for sqoop options.
	:py:mod:`airflow.providers.apache.sqoop.hooks.sqoop`
	====================================================

	.. py:module:: airflow.providers.apache.sqoop.hooks.sqoop

	.. autoapi-nested-parse::

	This module contains a sqoop 1.x hook



	Module Contents
	---------------

	Classes
	~~~~~~~

	.. autoapisummary::

	airflow.providers.apache.sqoop.hooks.sqoop.SqoopHook




	.. py:class:: SqoopHook(conn_id = default_conn_name, verbose = False, num_mappers = None, hcatalog_database = None, hcatalog_table = None, properties = None)

	Bases: :py:obj:`airflow.hooks.base.BaseHook`

	This hook is a wrapper around the sqoop 1 binary. To be able to use the hook
	it is required that "sqoop" is in the PATH.

	Additional arguments that can be passed via the 'extra' JSON field of the
	sqoop connection:

	* ``job_tracker``: Job tracker local\|jobtracker:port.
	* ``namenode``: Namenode.
	* ``lib_jars``: Comma separated jar files to include in the classpath.
	* ``files``: Comma separated files to be copied to the map reduce cluster.
	* ``archives``: Comma separated archives to be unarchived on the compute
	machines.
	* ``password_file``: Path to file containing the password.

	:param conn_id: Reference to the sqoop connection.
	:param verbose: Set sqoop to verbose.
	:param num_mappers: Number of map tasks to import in parallel.
	:param properties: Properties to set via the -D argument

	.. py:attribute:: conn_name_attr
	:annotation: = conn_id



	.. py:attribute:: default_conn_name
	:annotation: = sqoop_default



	.. py:attribute:: conn_type
	:annotation: = sqoop



	.. py:attribute:: hook_name
	:annotation: = Sqoop



	.. py:method:: get_conn(self)

	Returns connection for the hook.


	.. py:method:: cmd_mask_password(self, cmd_orig)

	Mask command password for safety


	.. py:method:: popen(self, cmd, **kwargs)

	Remote Popen

	:param cmd: command to remotely execute
	:param kwargs: extra arguments to Popen (see subprocess.Popen)
	:return: handle to subprocess


	.. py:method:: import_table(self, table, target_dir = None, append = False, file_type = 'text', columns = None, split_by = None, where = None, direct = False, driver = None, extra_import_options = None, schema = None)

	Imports table from remote location to target dir. Arguments are
	copies of direct sqoop command line arguments

	:param table: Table to read
	:param schema: Schema name
	:param target_dir: HDFS destination dir
	:param append: Append data to an existing dataset in HDFS
	:param file_type: "avro", "sequence", "text" or "parquet".
	Imports data to into the specified format. Defaults to text.
	:param columns: <col,col,col…> Columns to import from table
	:param split_by: Column of the table used to split work units
	:param where: WHERE clause to use during import
	:param direct: Use direct connector if exists for the database
	:param driver: Manually specify JDBC driver class to use
	:param extra_import_options: Extra import options to pass as dict.
	If a key doesn't have a value, just pass an empty string to it.
	Don't include prefix of -- for sqoop options.


	.. py:method:: import_query(self, query, target_dir = None, append = False, file_type = 'text', split_by = None, direct = None, driver = None, extra_import_options = None)

	Imports a specific query from the rdbms to hdfs

	:param query: Free format query to run
	:param target_dir: HDFS destination dir
	:param append: Append data to an existing dataset in HDFS
	:param file_type: "avro", "sequence", "text" or "parquet"
	Imports data to hdfs into the specified format. Defaults to text.
	:param split_by: Column of the table used to split work units
	:param direct: Use direct import fast path
	:param driver: Manually specify JDBC driver class to use
	:param extra_import_options: Extra import options to pass as dict.
	If a key doesn't have a value, just pass an empty string to it.
	Don't include prefix of -- for sqoop options.


	.. py:method:: export_table(self, table, export_dir = None, input_null_string = None, input_null_non_string = None, staging_table = None, clear_staging_table = False, enclosed_by = None, escaped_by = None, input_fields_terminated_by = None, input_lines_terminated_by = None, input_optionally_enclosed_by = None, batch = False, relaxed_isolation = False, extra_export_options = None, schema = None)

	Exports Hive table to remote location. Arguments are copies of direct
	sqoop command line Arguments

	:param table: Table remote destination
	:param schema: Schema name
	:param export_dir: Hive table to export
	:param input_null_string: The string to be interpreted as null for
	string columns
	:param input_null_non_string: The string to be interpreted as null
	for non-string columns
	:param staging_table: The table in which data will be staged before
	being inserted into the destination table
	:param clear_staging_table: Indicate that any data present in the
	staging table can be deleted
	:param enclosed_by: Sets a required field enclosing character
	:param escaped_by: Sets the escape character
	:param input_fields_terminated_by: Sets the field separator character
	:param input_lines_terminated_by: Sets the end-of-line character
	:param input_optionally_enclosed_by: Sets a field enclosing character
	:param batch: Use batch mode for underlying statement execution
	:param relaxed_isolation: Transaction isolation to read uncommitted
	for the mappers
	:param extra_export_options: Extra export options to pass as dict.
	If a key doesn't have a value, just pass an empty string to it.
	Don't include prefix of -- for sqoop options.