| |
| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| |
| |
| Direct-mode Imports |
| ------------------- |
| |
| While the JDBC-based import method used by Sqoop provides it with the |
| ability to read from a variety of databases using a generic driver, it |
| is not the most high-performance method available. Sqoop can read from |
| certain database systems faster by using their built-in export tools. |
| |
| For example, Sqoop can read from a MySQL database by using the +mysqldump+ |
| tool distributed with MySQL. You can take advantage of this faster |
| import method by running Sqoop with the +--direct+ argument. This |
| combined with a connect string that begins with +jdbc:mysql://+ will |
| inform Sqoop that it should select the faster access method. |
| |
| If your delimiters exactly match the delimiters used by +mysqldump+, |
| then Sqoop will use a fast-path that copies the data directly from |
| +mysqldump+'s output into HDFS. Otherwise, Sqoop will parse +mysqldump+'s |
| output into fields and transcode them into the user-specified delimiter set. |
| This incurs additional processing, so performance may suffer. |
| For convenience, the +--mysql-delimiters+ |
| argument will set all the output delimiters to be consistent with |
| +mysqldump+'s format. |
| |
| Sqoop also provides a direct-mode backend for PostgreSQL that uses the |
| +COPY TO STDOUT+ protocol from +psql+. No specific delimiter set provides |
| better performance; Sqoop will forward delimiter control arguments to |
| +psql+. |
| |
| The "Supported Databases" section provides a full list of database vendors |
| which have direct-mode support from Sqoop. |
| |
| When writing to HDFS, direct mode will open a single output file to receive |
| the results of the import. You can instruct Sqoop to use multiple output |
| files by using the +--direct-split-size+ argument which takes a size in |
| bytes. Sqoop will generate files of approximately this size. e.g., |
| +--direct-split-size 1000000+ will generate files of approximately 1 MB |
| each. If compressing the HDFS files with +--compress+, this will allow |
| subsequent MapReduce programs to use multiple mappers across your data |
| in parallel. |
| |
| Tool-specific arguments |
| ~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Sqoop will generate a set of command-line arguments with which it invokes |
| the underlying direct-mode tool (e.g., mysqldump). You can specify additional |
| arguments which should be passed to the tool by passing them to Sqoop |
| after a single '+-+' argument. e.g.: |
| |
| ---- |
| $ sqoop --connect jdbc:mysql://localhost/db --table foo --direct - --lock-tables |
| ---- |
| |
| The +--lock-tables+ argument (and anything else to the right of the +-+ argument) |
| will be passed directly to mysqldump. |
| |
| |
| |
| |