| --- |
| layout: post |
| status: PUBLISHED |
| published: true |
| title: What's New in Apache Sqoop 1.4.0-incubating |
| excerpt: "<p> <font size=\"2\">Apache Sqoop recently celebrates its first incubator |
| release, version 1.4.0-incubating. This blog post highlights some of the newly added |
| features for this release.</font></p>\r\n <p> </p>" |
| id: cde15fdd-b85c-477c-8974-7428ca4e40f7 |
| date: '2012-01-13 01:17:06 -0500' |
| categories: sqoop |
| tags: [] |
| permalink: sqoop/entry/what_s_new_in_apache |
| --- |
| <p><font size="3"></p> |
| <p><font size="5">What's New in Apache Sqoop 1.4.0-incubating</font></p> |
| <p>Apache Sqoop recently celebrates its first incubator release, version 1.4.0-incubating. There are several new features and improvements added in this release. This post will cover some of those interesting changes. Sqoop is currently undergoing incubation at The Apache Software Foundation. More information on this project can be found at <a href="http://incubator.apache.org/sqoop">http://incubator.apache.org/sqoop</a>.</p> |
| <p><font size="4">Customized Type Mapping (<a href="http://issues.apache.org/jira/browse/SQOOP-342">SQOOP-342</a>)</font></p> |
| <p>Sqoop is equipped with a default mapping from most SQL types to appropriate Java or Hive counterparts during import. Even though, this one-mapping-fits-all approach might not be ideal in all scenarios considering a wide variety of data stores available today, not to mention there are certain vendor-specific SQL types that may not be covered by the default mapping.</p> |
| <p>To allow customized type mapping, two new arguments, <font face="courier new,courier,monospace">map-column-java</font> and <font face="courier new,courier,monospace">map-column-hive</font>, are introduced for changing mapping to Java and Hive, respectively. The list of mapping is expected in the form of <font face="courier new,courier,monospace"><br /> |
| <column name>=<target type></font>, such as</p> |
| <p><font face="courier new,courier,monospace">$ sqoop import ... --map-column-java id=Integer,name=String</font></p> |
| <p>For the above example, the columns <font face="courier new,courier,monospace">id</font> and <font face="courier new,courier,monospace">name</font> will be mapped to Java <font face="courier new,courier,monospace">Integer</font> and <font face="courier new,courier,monospace">String</font>, respectively.</p> |
| <p><font size="4">Boundary Query Support (<a href="http://issues.apache.org/jira/browse/SQOOP-331">SQOOP-331</a>)</font></p> |
| <p>Sqoop uses a canned query (<font face="courier new,courier,monospace">select min(<split column>), max(<split column>) from<br /> |
| <table name></font>) to determine boundaries for creating splits in all cases by default. This query may not always be the most optimal one however. Hence, to provide flexibility for using different queries based on distinct usages, a new <font face="courier new,courier,monospace">boundary-query</font> argument is provided to take any arbitrary query returning two numeric columns for the same purpose of creating splits.</p> |
| <p><font size="4">Date/Time Incremental Append (<a href="http://issues.apache.org/jira/browse/SQOOP-321">SQOOP-321</a>)</font></p> |
| <p>Incremental import in Sqoop can be used to only retrieve those rows with the value of a check column beyond a certain threshold. The threshold needs to be the maximum value of the check column (in <font face="courier new,courier,monospace">append</font> mode) or the timestamp (in <font face="courier new,courier,monospace">lastmodified</font> mode) at the end of last import.</p> |
| <p>Previously, in <font face="courier new,courier,monospace">append</font> mode, the check column has to be in numeric type. If a date/time type is desired, the user has to manually select the maximum value out of the date/time column and then specify that value as the <font face="courier new,courier,monospace">last-value</font> argument in <font face="courier new,courier,monospace">lastmodified</font> mode instead. As part of this release, now the check column can be in date/time type as well.</p> |
| <p><font size="4">Composite Key Update (<a href="http://issues.apache.org/jira/browse/SQOOP-313">SQOOP-313</a>)</font></p> |
| <p>By default, Sqoop export adds new records into a table by <font face="courier new,courier,monospace">INSERT</font> statements. However, if any record is in conflict with an existing one due to table constraints (such as a unique key), the underlying <font face="courier new,courier,monospace">INSERT</font> statement will fail and the export process will fail. If an existing record needs to be modified, the <font face="courier new,courier,monospace">update-key</font> argument can be specified and <font face="courier new,courier,monospace">UPDATE</font> statements will be used instead underneath.</p> |
| <p>Before this release, only a single column name can be specified in the <font face="courier new,courier,monospace">update-key</font> argument. This column name will be used to determine the matching record(s) for update. However, in many real world situations, multiple columns are required to identify the matching record(s). Thus, starting from this release, a comma separated list of column names can be given as the <font face="courier new,courier,monospace">update-key</font> argument.</p> |
| <p><font size="4">Mixed Update/Insert Export (<a href="http://issues.apache.org/jira/browse/SQOOP-327">SQOOP-327</a>)</font></p> |
| <p>As mentioned, Sqoop export can only either insert (by default) or update (with the <font face="courier new,courier,monospace">update-key</font> argument) records into a table. As a result, one issue is that if data are being inserted, they may cause constraint violations when they exist already. Another issue is that if data are being updated, they may be silently ignored when there are no matching update keys found. It lacks the functionality to both update those data with matching update keys and insert those without.</p> |
| <p>A new <font face="courier new,courier,monospace">update-mode</font> argument is introduced to resolve the above issues. Its value can be either <font face="courier new,courier,monospace">updateonly</font> or <font face="courier new,courier,monospace">allowinsert</font>. As the name suggests, the difference is those records without matching update keys are simply dropped when the value is <font face="courier new,courier,monospace">updateonly</font> or are inserted when the value is <font face="courier new,courier,monospace">allowinsert</font>. Note that this feature is currently provided only for built-in Oracle connector.</p> |
| <p><font size="4">IBM DB2 Support (<a href="http://issues.apache.org/jira/browse/SQOOP-329">SQOOP-329</a>)</font></p> |
| <p>The extensible architecture used by Sqoop allows support for a data store to be added as a so-called connector. By default, Sqoop comes with connectors for a variety of databases such as MySQL, PostgreSQL, Oracle, and SQL Server. In addition, there are also third-party connectors available separately from various vendors for several other data stores, such Couchbase, VoltDB, and Netezza. As part of this release, a new connector is provided to import and export data against IBM DB2 database.</p> |
| <p><font size="4">The Final Chapter</font></p> |
| <p>If you are interested in learning more about the changes, a complete list for Sqoop 1.4.0-incubating can be found <a href="http://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311320&version=12317345">here</a>. You are also encouraged to give this new release a try. Any help and feedback is more than welcome. For more information on how to report problems and to get involved, visit the Sqoop project website at <a href="http://incubator.apache.org/sqoop/">http://incubator.apache.org/sqoop/</a>.</p> |
| <p></font></p> |