| |
| //// |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| |
| |
| [[validation]] |
| +validation+ |
| -------------- |
| |
| |
| Purpose |
| ~~~~~~~ |
| |
| Validate the data copied, either import or export by comparing the row |
| counts from the source and the target post copy. |
| |
| |
| Introduction |
| ~~~~~~~~~~~~ |
| |
| There are 3 basic interfaces: |
| ValidationThreshold - Determines if the error margin between the source and |
| target are acceptable: Absolute, Percentage Tolerant, etc. |
| Default implementation is AbsoluteValidationThreshold which ensures the row |
| counts from source and targets are the same. |
| |
| ValidationFailureHandler - Responsible for handling failures: log an |
| error/warning, abort, etc. |
| Default implementation is LogOnFailureHandler that logs a warning message to |
| the configured logger. |
| |
| Validator - Drives the validation logic by delegating the decision to |
| ValidationThreshold and delegating failure handling to ValidationFailureHandler. |
| The default implementation is RowCountValidator which validates the row |
| counts from source and the target. |
| |
| |
| Syntax |
| ~~~~~~ |
| |
| ---- |
| $ sqoop import (generic-args) (import-args) |
| $ sqoop export (generic-args) (export-args) |
| ---- |
| |
| Validation arguments are part of import and export arguments. |
| |
| |
| Configuration |
| ~~~~~~~~~~~~~ |
| |
| The validation framework is extensible and pluggable. It comes with default |
| implementations but the interfaces can be extended to allow custom |
| implementations by passing them as part of the command line arguments as |
| described below. |
| |
| |
| .Validator |
| Property: validator |
| Description: Driver for validation, |
| must implement org.apache.sqoop.validation.Validator |
| Supported values: The value has to be a fully qualified class name. |
| Default value: org.apache.sqoop.validation.RowCountValidator |
| |
| .Validation Threshold |
| Property: validation-threshold |
| Description: Drives the decision based on the validation meeting the |
| threshold or not. Must implement |
| org.apache.sqoop.validation.ValidationThreshold |
| Supported values: The value has to be a fully qualified class name. |
| Default value: org.apache.sqoop.validation.AbsoluteValidationThreshold |
| |
| .Validation Failure Handler |
| Property: validation-failurehandler |
| Description: Responsible for handling failures, must implement |
| org.apache.sqoop.validation.ValidationFailureHandler |
| Supported values: The value has to be a fully qualified class name. |
| Default value: org.apache.sqoop.validation.AbortOnFailureHandler |
| |
| |
| Limitations |
| ~~~~~~~~~~~ |
| |
| Validation currently only validates data copied from a single table into HDFS. |
| The following are the limitations in the current implementation: |
| |
| * all-tables option |
| * free-form query option |
| * Data imported into Hive, HBase or Accumulo |
| * table import with --where argument |
| * incremental imports |
| |
| |
| Example Invocations |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| A basic import of a table named +EMPLOYEES+ in the +corp+ database that uses |
| validation to validate the row counts: |
| |
| ---- |
| $ sqoop import --connect jdbc:mysql://db.foo.com/corp \ |
| --table EMPLOYEES --validate |
| ---- |
| |
| A basic export to populate a table named +bar+ with validation enabled: |
| |
| ---- |
| $ sqoop export --connect jdbc:mysql://db.example.com/foo --table bar \ |
| --export-dir /results/bar_data --validate |
| ---- |
| |
| Another example that overrides the validation args: |
| |
| ---- |
| $ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ |
| --validate --validator org.apache.sqoop.validation.RowCountValidator \ |
| --validation-threshold \ |
| org.apache.sqoop.validation.AbsoluteValidationThreshold \ |
| --validation-failurehandler \ |
| org.apache.sqoop.validation.AbortOnFailureHandler |
| ---- |