| //// |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| |
| Troubleshooting |
| --------------- |
| |
| General Troubleshooting Process |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The following steps should be followed to troubleshoot any failure that you |
| encounter while running Sqoop. |
| |
| - Turn on verbose output by executing the same command again and specifying |
| the +\--verbose+ option. This produces more debug output on the console |
| which can be inspected to identify any obvious errors. |
| |
| - Look at the task logs from Hadoop to see if there are any specific failures |
| recorded there. It is possible that the failure that occurs while task |
| execution is not relayed correctly to the console. |
| |
| - Make sure that the necessary input files or input/output tables are present |
| and can be accessed by the user that Sqoop is executing as or connecting to |
| the database as. It is possible that the necessary files or tables are present |
| but the specific user that Sqoop connects as does not have the necessary |
| permissions to access these files. |
| |
| - If you are doing a compound action such as populating a Hive table or |
| partition, try breaking the job into two separate actions to see where the |
| problem really occurs. For example if an import that creates and populates a |
| Hive table is failing, you can break it down into two steps - first for doing |
| the import alone, and the second to create a Hive table without the import |
| using the +create-hive-table+ tool. While this does not address the original |
| use-case of populating the Hive table, it does help narrow down the problem |
| to either regular import or during the creation and population of Hive table. |
| |
| - Search the mailing lists archives and JIRA for keywords relating to the |
| problem. It is possible that you may find a solution discussed there that |
| will help you solve or work-around your problem. |
| |
| Specific Troubleshooting Tips |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Oracle: Connection Reset Errors |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Problem: When using the default Sqoop connector for Oracle, some data does |
| get transferred, but during the map-reduce job a lot of errors are reported |
| as below: |
| |
| ---- |
| 11/05/26 16:23:47 INFO mapred.JobClient: Task Id : attempt_201105261333_0002_m_000002_0, Status : FAILED |
| java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Connection reset |
| at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164) |
| at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) |
| at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) |
| at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:605) |
| at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) |
| at org.apache.hadoop.mapred.Child$4.run(Child.java:268) |
| at java.security.AccessController.doPrivileged(Native Method) |
| at javax.security.auth.Subject.doAs(Subject.java:396) |
| at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) |
| at org.apache.hadoop.mapred.Child.main(Child.java:262) |
| Caused by: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Connection reset |
| at com.cloudera.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:190) |
| at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:159) |
| ... 9 more |
| Caused by: java.sql.SQLRecoverableException: IO Error: Connection reset |
| at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:428) |
| at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:536) |
| at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:228) |
| at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32) |
| at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521) |
| at java.sql.DriverManager.getConnection(DriverManager.java:582) |
| at java.sql.DriverManager.getConnection(DriverManager.java:185) |
| at com.cloudera.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:152) |
| at com.cloudera.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:184) |
| ... 10 more |
| Caused by: java.net.SocketException: Connection reset |
| at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) |
| at java.net.SocketOutputStream.write(SocketOutputStream.java:136) |
| at oracle.net.ns.DataPacket.send(DataPacket.java:199) |
| at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:211) |
| at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:227) |
| at oracle.net.ns.NetInputStream.read(NetInputStream.java:175) |
| at oracle.net.ns.NetInputStream.read(NetInputStream.java:100) |
| at oracle.net.ns.NetInputStream.read(NetInputStream.java:85) |
| at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123) |
| at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79) |
| at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1122) |
| at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1099) |
| at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:288) |
| at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191) |
| at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:366) |
| at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:752) |
| at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:366) |
| ... 18 more |
| |
| ---- |
| |
| Solution: This problem occurs primarily due to the lack of a fast random |
| number generation device on the host where the map tasks execute. On |
| typical Linux systems this can be addressed by setting the following |
| property in the +java.security+ file: |
| |
| ---- |
| securerandom.source=file:/dev/../dev/urandom |
| ---- |
| |
| The +java.security+ file can be found under +$JAVA_HOME/jre/lib/security+ |
| directory. Alternatively, this property can also be specified on the |
| command line via: |
| |
| ---- |
| -D mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom" |
| ---- |
| |
| Please note that it's very important to specify this weird path +/dev/../dev/urandom+ |
| as it is due to a Java bug |
| http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6202721[6202721], |
| or +/dev/urandom+ will be ignored and substituted by +/dev/random+. |
| |
| Oracle: Case-Sensitive Catalog Query Errors |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Problem: While working with Oracle you may encounter problems when Sqoop can |
| not figure out column names. This happens because the catalog queries that |
| Sqoop uses for Oracle expect the correct case to be specified for the |
| user name and table name. |
| |
| One example, using --hive-import and resulting in a NullPointerException: |
| |
| ---- |
| 1/09/21 17:18:49 INFO manager.OracleManager: Time zone has been set to |
| GMT |
| 11/09/21 17:18:49 DEBUG manager.SqlManager: Using fetchSize for next |
| query: 1000 |
| 11/09/21 17:18:49 INFO manager.SqlManager: Executing SQL statement: |
| SELECT t.* FROM addlabel_pris t WHERE 1=0 |
| 11/09/21 17:18:49 DEBUG manager.OracleManager$ConnCache: Caching |
| released connection for jdbc:oracle:thin: |
| 11/09/21 17:18:49 ERROR sqoop.Sqoop: Got exception running Sqoop: |
| java.lang.NullPointerException |
| java.lang.NullPointerException |
| at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148) |
| at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187) |
| at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362) |
| at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423) |
| at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144) |
| at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) |
| at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180) |
| at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219) |
| at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228) |
| at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237) |
| |
| ---- |
| |
| .Solution: |
| . Specify the user name, which Sqoop is connecting as, in upper case (unless |
| it was created with mixed/lower case within quotes). |
| . Specify the table name, which you are working with, in upper case (unless |
| it was created with mixed/lower case within quotes). |
| |
| MySQL: Connection Failure |
| ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Problem: While importing a MySQL table into Sqoop, if you do not have |
| the necessary permissions to access your MySQL database over the network, |
| you may get the below connection failure. |
| |
| ---- |
| Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure |
| ---- |
| |
| Solution: First, verify that you can connect to the database from the node where |
| you are running Sqoop: |
| ---- |
| $ mysql --host=<IP Address> --database=test --user=<username> --password=<password> |
| ---- |
| If this works, it rules out any problem with the client network configuration |
| or security/authentication configuration. |
| |
| Add the network port for the server to your my.cnf file +/etc/my.cnf+: |
| ---- |
| [mysqld] |
| port = xxxx |
| ---- |
| |
| Set up a user account to connect via Sqoop. |
| Grant permissions to the user to access the database over the network: |
| (1.) Log into MySQL as root +mysql -u root -p<ThisIsMyPassword>+. |
| (2.) Issue the following command: |
| ---- |
| mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword' |
| ---- |
| |
| Note that doing this will enable the testuser to connect to the |
| MySQL server from any IP address. While this will work, it is not |
| advisable for a production environment. We advise consulting with your |
| DBA to grant the necessary privileges based on the setup topology. |
| |
| If the database server's IP address changes, unless it is bound to |
| a static hostname in your server, the connect string passed into Sqoop |
| will also need to be changed. |
| |
| Oracle: ORA-00933 error (SQL command not properly ended) |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Problem: While working with Oracle you may encounter the below problem |
| when the Sqoop command explicitly specifies the --driver |
| <driver name> option. When the driver option is included in |
| the Sqoop command, the built-in connection manager selection defaults to the |
| generic connection manager, which causes this issue with Oracle. If the |
| driver option is not specified, the built-in connection manager selection |
| mechanism selects the Oracle specific connection manager which generates |
| valid SQL for Oracle and uses the driver "oracle.jdbc.OracleDriver". |
| |
| ---- |
| ERROR manager.SqlManager: Error executing statement: |
| java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended |
| ---- |
| |
| Solution: Omit the option --driver oracle.jdbc.driver.OracleDriver and then |
| re-run the Sqoop command. |
| |
| MySQL: Import of TINYINT(1) from MySQL behaves strangely |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Problem: Sqoop is treating TINYINT(1) columns as booleans, which is for example |
| causing issues with HIVE import. This is because by default the MySQL JDBC connector |
| maps the TINYINT(1) to java.sql.Types.BIT, which Sqoop by default maps to Boolean. |
| |
| Solution: A more clean solution is to force MySQL JDBC Connector to stop |
| converting TINYINT(1) to java.sql.Types.BIT by adding +tinyInt1isBit=false+ into your |
| JDBC path (to create something like +jdbc:mysql://localhost/test?tinyInt1isBit=false+). |
| Another solution would be to explicitly override the column mapping for the datatype |
| TINYINT(1) column. For example, if the column name is foo, then pass the following |
| option to Sqoop during import: --map-column-hive foo=tinyint. In the case of non-Hive |
| imports to HDFS, use --map-column-java foo=integer. |