| |
| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| |
| |
| Connecting to a Database Server |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Sqoop is designed to import tables from a database into HDFS. As such, |
| it requires a _connect string_ that describes how to connect to the |
| database. The _connect string_ looks like a URL, and is communicated to |
| Sqoop with the +--connect+ argument. This describes the server and |
| database to connect to; it may also specify the port. e.g.: |
| |
| ---- |
| $ sqoop --connect jdbc:mysql://database.example.com/employees |
| ---- |
| |
| This string will connect to a MySQL database named +employees+ on the |
| host +database.example.com+. It's important that you *do not* use the URL |
| +localhost+ if you intend to use Sqoop with a distributed Hadoop |
| cluster. The connect string you supply will be used on TaskTracker nodes |
| throughout your MapReduce cluster; if they're told to connect to the |
| literal name +localhost+, they'll each reach a different |
| database (or more likely, no database at all)! Instead, you should use |
| the full hostname or IP address of the database host that can be seen |
| by all your remote nodes. |
| |
| You may need to authenticate against the database before you can |
| access it. The +--username+ and +--password+ or +-P+ parameters can |
| be used to supply a username and a password to the database. e.g.: |
| |
| ---- |
| $ sqoop --connect jdbc:mysql://database.example.com/employees \ |
| --username aaron --password 12345 |
| ---- |
| |
| .Password security |
| WARNING: The +--password+ parameter is insecure, as other users may |
| be able to read your password from the command-line arguments via |
| the output of programs such as `ps`. The *+-P+* argument will read |
| a password from a console prompt, and is the preferred method of |
| entering credentials. Credentials may still be transferred between |
| nodes of the MapReduce cluster using insecure means. |
| |
| Sqoop automatically supports several databases, including MySQL. Connect strings beginning |
| with +jdbc:mysql://+ are handled automatically Sqoop, though you may need |
| to install the driver yourself. (A full list of databases with |
| built-in support is provided in the "Supported Databases" section, below.) |
| |
| You can use Sqoop with any other |
| JDBC-compliant database as well. First, download the appropriate JDBC |
| driver for the database you want to import from, and install the .jar |
| file in the +/usr/hadoop/lib+ directory on all machines in your Hadoop |
| cluster, or some other directory which is in the classpath |
| on all nodes. Each driver jar also has a specific driver class which defines |
| the entry-point to the driver. For example, MySQL's Connector/J library has |
| a driver class of +com.mysql.jdbc.Driver+. Refer to your database |
| vendor-specific documentation to determine the main driver class. |
| This class must be provided as an argument to Sqoop with +--driver+. |
| |
| For example, to connect to a postgres database, first download the driver from |
| link:http://jdbc.postgresql.org[http://jdbc.postgresql.org] and |
| install it in your Hadoop lib path. |
| Then run Sqoop with something like: |
| |
| ---- |
| $ sqoop --connect jdbc:postgresql://postgres-server.example.com/employees \ |
| --driver org.postgresql.Driver |
| ---- |
| |