blob: 048c5ac0b81e91de5018f3fbc82f482a985b54af [file] [log] [blame]
---
layout: post
status: PUBLISHED
published: true
title: What's New in Apache Sqoop 1.4.0-incubating
excerpt: "<p> <font size=\"2\">Apache Sqoop recently celebrates its first incubator
release, version 1.4.0-incubating. This blog post highlights some of the newly added
features for this release.</font></p>\r\n <p> </p>"
id: cde15fdd-b85c-477c-8974-7428ca4e40f7
date: '2012-01-13 01:17:06 -0500'
categories: sqoop
tags: []
permalink: sqoop/entry/what_s_new_in_apache
---
<p><font size="3"></p>
<p><font size="5">What's New in Apache Sqoop 1.4.0-incubating</font></p>
<p>Apache Sqoop recently celebrates its first incubator release, version 1.4.0-incubating.&nbsp; There are several new features and improvements added in this release.&nbsp; This post will cover some of those interesting changes.&nbsp; Sqoop is currently undergoing incubation at The Apache Software Foundation.&nbsp; More information on this project can be found at <a href="http://incubator.apache.org/sqoop">http://incubator.apache.org/sqoop</a>.</p>
<p><font size="4">Customized Type Mapping (<a href="http://issues.apache.org/jira/browse/SQOOP-342">SQOOP-342</a>)</font></p>
<p>Sqoop is equipped with a default mapping from most SQL types to appropriate Java or Hive counterparts during import.&nbsp; Even though, this one-mapping-fits-all approach might not be ideal in all scenarios considering a wide variety of data stores available today, not to mention there are certain vendor-specific SQL types that may not be covered by the default mapping.</p>
<p>To allow customized type mapping, two new arguments, <font face="courier new,courier,monospace">map-column-java</font> and <font face="courier new,courier,monospace">map-column-hive</font>, are introduced for changing mapping to Java and Hive, respectively.&nbsp; The list of mapping is expected in the form of <font face="courier new,courier,monospace"><br />
<column name>=<target type></font>, such as</p>
<p><font face="courier new,courier,monospace">$ sqoop import ... --map-column-java id=Integer,name=String</font></p>
<p>For the above example, the columns <font face="courier new,courier,monospace">id</font> and <font face="courier new,courier,monospace">name</font> will be mapped to Java <font face="courier new,courier,monospace">Integer</font> and <font face="courier new,courier,monospace">String</font>, respectively.</p>
<p><font size="4">Boundary Query Support (<a href="http://issues.apache.org/jira/browse/SQOOP-331">SQOOP-331</a>)</font></p>
<p>Sqoop uses a canned query (<font face="courier new,courier,monospace">select min(<split column>), max(<split column>) from<br />
<table name></font>) to determine boundaries for creating splits in all cases by default.&nbsp; This query may not always be the most optimal one however.&nbsp; Hence, to provide flexibility for using different queries based on distinct usages, a new <font face="courier new,courier,monospace">boundary-query</font> argument is provided to take any arbitrary query returning two numeric columns for the same purpose of creating splits.</p>
<p><font size="4">Date/Time Incremental Append (<a href="http://issues.apache.org/jira/browse/SQOOP-321">SQOOP-321</a>)</font></p>
<p>Incremental import in Sqoop can be used to only retrieve those rows with the value of a check column beyond a certain threshold.&nbsp; The threshold needs to be the maximum value of the check column (in <font face="courier new,courier,monospace">append</font> mode) or the timestamp (in <font face="courier new,courier,monospace">lastmodified</font> mode) at the end of last import.</p>
<p>Previously, in <font face="courier new,courier,monospace">append</font> mode, the check column has to be in numeric type.&nbsp; If a date/time type is desired, the user has to manually select the maximum value out of the date/time column and then specify that value as the <font face="courier new,courier,monospace">last-value</font> argument in <font face="courier new,courier,monospace">lastmodified</font> mode instead.&nbsp; As part of this release, now the check column can be in date/time type as well.</p>
<p><font size="4">Composite Key Update (<a href="http://issues.apache.org/jira/browse/SQOOP-313">SQOOP-313</a>)</font></p>
<p>By default, Sqoop export adds new records into a table by <font face="courier new,courier,monospace">INSERT</font> statements.&nbsp; However, if any record is in conflict with an existing one due to table constraints (such as a unique key), the underlying <font face="courier new,courier,monospace">INSERT</font> statement will fail and the export process will fail.&nbsp; If an existing record needs to be modified, the <font face="courier new,courier,monospace">update-key</font> argument can be specified and <font face="courier new,courier,monospace">UPDATE</font> statements will be used instead underneath.</p>
<p>Before this release, only a single column name can be specified in the <font face="courier new,courier,monospace">update-key</font> argument.&nbsp; This column name will be used to determine the matching record(s) for update.&nbsp; However, in many real world situations, multiple columns are required to identify the matching record(s).&nbsp; Thus, starting from this release, a comma separated list of column names can be given as the <font face="courier new,courier,monospace">update-key</font> argument.</p>
<p><font size="4">Mixed Update/Insert Export (<a href="http://issues.apache.org/jira/browse/SQOOP-327">SQOOP-327</a>)</font></p>
<p>As mentioned, Sqoop export can only either insert (by default) or update (with the <font face="courier new,courier,monospace">update-key</font> argument) records into a table.&nbsp; As a result, one issue is that if data are being inserted, they may cause constraint violations when they exist already.&nbsp; Another issue is that if data are being updated, they may be silently ignored when there are no matching update keys found.&nbsp; It lacks the functionality to both update those data with matching update keys and insert those without.</p>
<p>A new <font face="courier new,courier,monospace">update-mode</font> argument is introduced to resolve the above issues.&nbsp; Its value can be either <font face="courier new,courier,monospace">updateonly</font> or <font face="courier new,courier,monospace">allowinsert</font>.&nbsp; As the name suggests, the difference is those records without matching update keys are simply dropped when the value is <font face="courier new,courier,monospace">updateonly</font> or are inserted when the value is <font face="courier new,courier,monospace">allowinsert</font>.&nbsp; Note that this feature is currently provided only for built-in Oracle connector.</p>
<p><font size="4">IBM DB2 Support (<a href="http://issues.apache.org/jira/browse/SQOOP-329">SQOOP-329</a>)</font></p>
<p>The extensible architecture used by Sqoop allows support for a data store to be added as a so-called connector.&nbsp; By default, Sqoop comes with connectors for a variety of databases such as MySQL, PostgreSQL, Oracle, and SQL Server.&nbsp; In addition, there are also third-party connectors available separately from various vendors for several other data stores, such Couchbase, VoltDB, and Netezza.&nbsp; As part of this release, a new connector is provided to import and export data against IBM DB2 database.</p>
<p><font size="4">The Final Chapter</font></p>
<p>If you are interested in learning more about the changes, a complete list for Sqoop 1.4.0-incubating can be found <a href="http://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311320&amp;version=12317345">here</a>.&nbsp; You are also encouraged to give this new release a try.&nbsp; Any help and feedback is more than welcome. For more information on how to report problems and to get involved, visit the Sqoop project website at <a href="http://incubator.apache.org/sqoop/">http://incubator.apache.org/sqoop/</a>.</p>
<p></font></p>