blob: 8406cd4cffea0d8408783a213e3b661c4a2503e7 [file] [log] [blame]
---
layout: post
title: Apache Sqoop - Overview
date: '2011-10-06T01:38:13+00:00'
categories: sqoop
---
<p> </p>
<div style="background-color: transparent; ">
<p> </p>
<div style="background-color: transparent; ">
<h1>Apache Sqoop - Overview&nbsp;</h1>
<p><span style="font-family: Arial; font-size: 15px; white-space: pre-wrap; ">Using Hadoop for analytics and data processing requires loading data into clusters and processing it in conjunction with other data that often resides in production databases across the enterprise. Loading bulk data into Hadoop from production systems or accessing it from map reduce applications running on large clusters can be a challenging task. Users must consider details like ensuring consistency of data, the consumption of production system resources, data preparation for provisioning downstream pipeline. Transferring data using scripts is inefficient and time consuming. Directly accessing data residing on external systems from within the map reduce applications complicates applications and exposes the production system to the risk of excessive load originating from cluster nodes. </span></p>
<p><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">This is where Apache Sqoop fits in. Apache Sqoop is currently undergoing incubation at Apache Software Foundation. More information on this project can be found at <a href="http://incubator.apache.org/sqoop" title="Apache Sqoop">http://incubator.apache.org/sqoop</a>.</span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">Sqoop allows easy import and export of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from external system on to HDFS, and populate tables in Hive and HBase. Sqoop integrates with Oozie, allowing you to schedule and automate import and export tasks. Sqoop uses a connector based architecture which supports plugins that provide connectivity to new external systems. </span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">What happens underneath the covers when you run Sqoop is very straightforward. The dataset being transferred is sliced up into different partitions and a map-only job is launched with individual mappers responsible for transferring a slice of this dataset. Each record of the data is handled in a type safe manner since Sqoop uses the database metadata to infer the data types. </span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">In the rest of this post we will walk through an example that shows the various ways you can use Sqoop. The goal of this post is to give an overview of Sqoop operation without going into much detail or advanced functionality.</span></p>
<p> </p>
<h1>Importing Data</h1>
<p><span style="font-family: Arial; font-size: 15px; white-space: pre-wrap; ">The following command is used to import all data from a table called ORDERS from a MySQL database:</span></p>
<div style="background-color: transparent; ">
<p><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">---</span><br /><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">$ sqoop import --connect jdbc:mysql://localhost/acmedb \</span><br /><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> &nbsp;&nbsp;--table ORDERS --username test --password ****</span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">---</span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">In this command the various options specified are as follows:</span></p>
<ul style="font-family: 'Times New Roman'; font-size: medium; ">
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">import:</span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> This is the sub-command that instructs Sqoop to initiate an import.</span></li>
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--connect &lt;connect string&gt;, --username &lt;user name&gt;, --password &lt;password&gt;: </span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">These are connection parameters that are used to connect with the database. This is no different from the connection parameters that you use when connecting to the database via a JDBC connection.</span></li>
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--table &lt;table name&gt;:</span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> This parameter specifies the table which will be imported.</span></li>
</ul>
<p><br /><span style="font-family: 'Times New Roman'; font-size: medium; "><span style="font-family: Arial; font-size: 15px; white-space: pre-wrap; ">The import is done in two steps as depicted in Figure 1 below. In the first Step Sqoop introspects the database to gather the necessary metadata for the data being imported. The second step is a map-only Hadoop job that Sqoop submits to the cluster. It is this job that does the actual data transfer using the metadata captured in the previous step.</span> </span></p>
<p style="text-align: center; "><img src="https://blogs.apache.org/sqoop/mediaresource/d76fa176-1331-4af3-95cf-ae6a0068c306" alt="Figure 1: Sqoop Import Overview" /></p>
<p style="text-align: center; "><b>Figure 1: Sqoop Import Overview</b></p>
<p> </p>
<div style="background-color: transparent; "><span id="internal-source-marker_0.41385090351104736" style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">The imported data is saved in a directory on HDFS based on the table being imported. As is the case with most aspects of Sqoop operation, the user can specify any alternative directory where the files should be populated. </span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">By default these files contain comma delimited fields, with new lines separating different records. You can easily override the format in which data is copied over by explicitly specifying the field separator and record terminator characters. </span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">Sqoop also supports different data formats for importing data. For example, you can easily import data in Avro data format by simply specifying the option </span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--as-avrodatafile</span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> with the import command.</span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /></div>
<p><span style="font-family: Arial; font-size: 15px; white-space: pre-wrap; ">There are many other options that Sqoop provides which can be used to further tune the import operation to suit your specific requirements.</span></p>
<h2>Importing Data into Hive</h2>
<div style="background-color: transparent; "><span id="internal-source-marker_0.41385090351104736" style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">In most cases, importing data into Hive is the same as running the import task and then using Hive to create and load a certain table or partition. Doing this manually requires that you know the correct type mapping between the data and other details like the serialization format and delimiters. Sqoop takes care of populating the Hive metastore with the appropriate metadata for the table and also invokes the necessary commands to load the table or partition as the case may be. All of this is done by simply specifying the option </span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--hive-import</span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> with the import command.</span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">----</span><br /><font face="'courier new', courier, monospace"><span style="font-size: 11pt; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">$ sqoop import --connect jdbc:mysql://localhost/acmedb \</span><br /><span style="font-size: 11pt; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> &nbsp;&nbsp;--table ORDERS --username test --password **** </span><span style="font-size: 11pt; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--hive-import</span></font><span style="font-family: 'Courier New'; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">----</span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">When you run a Hive import, Sqoop converts the data from the native datatypes within the external datastore into the corresponding types within Hive. Sqoop automatically chooses the native delimiter set used by Hive. If the data being imported has new line or other Hive delimiter characters in it, Sqoop allows you to remove such characters and get the data correctly populated for consumption in Hive.</span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /></div>
<p><span style="font-family: Arial; font-size: 15px; white-space: pre-wrap; ">Once the import is complete, you can see and operate on the table just like any other table in Hive.</span> </p>
<h2>Importing Data into HBase</h2>
<p><span style="font-size: medium; "> </span></p>
<div style="background-color: transparent; "><span id="internal-source-marker_0.41385090351104736" style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">You can use Sqoop to populate data in a particular column family within the HBase table. Much like the Hive import, this can be done by specifying the additional options that relate to the HBase table and column family being populated. All data imported into HBase is converted to their string representation and inserted as UTF-8 bytes. </span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">----</span><br /><font face="'courier new', courier, monospace"><span style="font-size: 11pt; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">$ sqoop import --connect jdbc:mysql://localhost/acmedb \</span><br /><span style="font-size: 11pt; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> &nbsp;--table ORDERS --username test --password **** \</span><br /><span style="font-size: 11pt; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> </span><span style="font-size: 11pt; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--hbase-create-table --hbase-table ORDERS --column-family mysql</span></font><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">----</span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">In this command the various options specified are as follows:</span>
<ul style="font-family: 'Times New Roman'; ">
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "><i>--hbase-create-table:</i></span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> This option instructs Sqoop to create the HBase table.</span></li>
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "><i>--hbase-table:</i></span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> This option specifies the table name to use.</span></li>
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "><i>--column-family:</i></span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> This option specifies the column family name to use.</span></li>
</ul>
<p><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">The rest of the options are the same as that for regular import operation.</span></p>
<h1>Exporting Data</h1>
</div>
<p><span style="font-family: 'Times New Roman'; font-size: medium; "> </span></p>
<div style="background-color: transparent; "><span id="internal-source-marker_0.41385090351104736" style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">In some cases data processed by Hadoop pipelines may be needed in production systems to help run additional critical business functions. Sqoop can be used to export such data into external datastores as necessary. Continuing our example from above - if data generated by the pipeline on Hadoop corresponded to the ORDERS table in a database somewhere, you could populate it using the following command:</span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">----</span><br /><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">$ sqoop </span><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">export</span><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> --connect jdbc:mysql://localhost/acmedb \</span><br /><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> &nbsp;--table ORDERS --username test --password **** \</span><br /><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> </span><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--export-dir /user/arvind/ORDERS</span><span style="font-size: 11pt; font-family: 'Courier New'; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">----</span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "><br class="kix-line-break" />In this command the various options specified are as follows:</span>
<ul>
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">export:</span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> This is the sub-command that instructs Sqoop to initiate an export.</span></li>
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--connect &lt;connect string&gt;, --username &lt;user name&gt;, --password &lt;password&gt;: </span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">These are connection parameters that are used to connect with the database. This is no different from the connection parameters that you use when connecting to the database via a JDBC connection.</span></li>
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--table &lt;table name&gt;:</span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "> This parameter specifies the table which will be populated.</span></li>
<li style="list-style-type: disc; font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; "><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: italic; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">--export-dir &lt;directory path&gt;: </span><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">This is the directory from which data will be exported.</span></li>
</ul>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">Export is done in two steps as depicted in Figure 2. The first step is to introspect the database for metadata, followed by the second step of transferring the data. Sqoop divides the input dataset into splits and then uses individual map tasks to push the splits to the database. Each map task performs this transfer over many transactions in order to ensure optimal throughput and minimal resource utilization.</span></p>
</div><span style="font-family: 'Times New Roman'; font-size: medium; "> </span><span style="font-family: 'Times New Roman'; font-size: medium; ">
<div style="background-color: transparent; ">
<p style="text-align: center; "><img src="https://blogs.apache.org/sqoop/mediaresource/12624986-9e30-430e-a0c7-e12176548f6d" alt="Figure 2: Sqoop Export Overview" /> </p>
</div></span>
<div style="text-align: center; font-size: medium; font-family: 'Times New Roman'; "><span style="font-family: verdana, arial, 'Bitstream Vera Sans', helvetica, sans-serif; font-size: 13px; "><b>Figure 2: Sqoop Export Overview</b></span></div>
<p> </p>
<p><span style="font-family: 'Times New Roman'; font-size: medium; "> </span></p>
<div style="background-color: transparent; ">
<p> </p>
<div style="background-color: transparent; ">
<p style="font-family: 'Times New Roman'; font-size: medium; "><span id="internal-source-marker_0.41385090351104736" style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">Some connectors support staging tables that help isolate production tables from possible corruption in case of job failures due to any reason. Staging tables are first populated by the map tasks and then merged into the target table once all of the data has been delivered it.</span></p>
<h1>Sqoop Connectors</h1>
<p> </p>
<div style="background-color: transparent; ">
<p style="font-family: 'Times New Roman'; font-size: medium; "><span id="internal-source-marker_0.41385090351104736" style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">Using specialized connectors, Sqoop can connect with external systems that have optimized import and export facilities, or do not support native JDBC. Connectors are plugin components based on Sqoops extension framework and can be added to any existing Sqoop installation. Once a connector is installed, Sqoop can use it to efficiently transfer data between Hadoop and the external store supported by the connector.</span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">By default Sqoop includes connectors for various popular databases such as MySQL, PostgreSQL, Oracle, SQL Server and DB2. It also includes fast-path connectors for MySQL and PostgreSQL databases. Fast-path connectors are specialized connectors that use database specific batch tools to transfer data with high throughput. Sqoop also includes a generic JDBC connector that can be used to connect to any database that is accessible via JDBC.</span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">Apart from the built-in connectors, many companies have developed their own connectors that can be plugged into Sqoop. These range from specialized connectors for enterprise data warehouse systems to NoSQL datastores.</span></p>
<h1>Wrapping Up</h1>
<div style="background-color: transparent; ">
<p><span id="internal-source-marker_0.41385090351104736" style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">In this post you saw how easy it is to transfer large datasets between Hadoop and external datastores such as relational databases. Beyond this, Sqoop offers many advance features such as different data formats, compression, working with queries instead of tables etc. We encourage you to try out Sqoop and give us your feedback. </span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span><br /><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; ">More information regarding Sqoop can be found at:</span><br /><span style="font-family: Arial; color: #000000; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap; font-size: 11pt; background-color: transparent; "></span></p>
<p> </p>
<p><font size="3">Project Website: <a href="http://incubator.apache.org/sqoop" title="Apache Sqoop">http://incubator.apache.org/sqoop</a></font></p>
<p><font size="3"></font><span style="font-size: medium; ">Wiki: <a href="https://cwiki.apache.org/confluence/display/SQOOP" title="Sqoop Wiki">https://cwiki.apache.org/confluence/display/SQOOP</a></span></p>
<p><font size="3">Project Status: &nbsp;<a href="http://incubator.apache.org/projects/sqoop.html" title="Sqoop Project Status">http://incubator.apache.org/projects/sqoop.html</a></font></p>
<p><font size="3">Mailing Lists: <a href="https://cwiki.apache.org/confluence/display/SQOOP/Mailing+Lists" title="Sqoop Mailing Lists">https://cwiki.apache.org/confluence/display/SQOOP/Mailing+Lists</a></font></p>
<p> </p>
</div>
</div>
<p> </p>
</div>
<p> </p>
</div>
<p> </p>
<p> </p>
<p> </p>
</div>
</div>
<p> </p>
</div>
<p> </p>
<p> </p>