| # Usergrid Data Migrator |
| |
| ## Prerequisites |
| * Python 2 (not python 3) |
| |
| * Install the Usergrid Python SDK: https://github.com/jwest-apigee/usergrid-python |
| |
| With Pip (requires python-pip to be installed): `pip install usergrid` |
| |
| * Install Usergrid Tools |
| |
| With Pip (requires python-pip to be installed): `pip install usergrid-tools` |
| |
| |
| ## Overview |
| The purpose of this document is to provide an overview of the Python Script provided in the same directory which allows you to migrate data, connections and users from one Usergrid platform / org / app to another. This can be used in the upgrade process from Usergrid 1.0 to 2.x since there is no upgrade path. |
| |
| This script functions by taking source and target endpoint configurations (with credentials) and a set of command-line parameters to read data from one Usergrid instance and write to another. It is written in Python and requires Python 2.7.6+. |
| |
| There are multiple processes at work in the migration to speed the process up. There is a main thread which reads entities from the API and then publishes the entities with metadata into a Python Queue which has multiple worker processes listening for work. The number of worker threads is configurable by command line parameters. |
| |
| |
| # Process to Migrate Data and Graph (Connections) |
| Usergrid is a Graph database and allows for connections between entities. In order for a connection to be made, both the source entity and the target entity must exist. Therefore, in order to migrate connections it is adviseable to first migrate all the data and then all the connections associated with that data. |
| |
| # Concepts |
| As with any migration process there is a source and a target. The source and target have the following parameters: |
| |
| * API URL: The HTTP[S] URL where the platform can be reached |
| * Org: You must specify one org at a time to migrate using this script |
| * App: You can optinally specify one or more applications to migrate. If you specify zero applications then all applications will be migrated |
| * Collection: You can optionally specify one or more collections to migrate. If you specify zero collections then all applications will be migrated |
| * QL: You can specify a Query Language predicate to be used. If none is specified, 'select *' will be used which will migrate all data within a given collection |
| * Graph: Graph implies traversal of graph edges which necessarily must exist. This is an alternative to using query which uses the indexing. |
| |
| # Graph Loops |
| |
| When iterating a graph it is possible to get stuck in a loop. For example: |
| |
| ``` |
| A --follows--> B |
| B --likes--> C |
| C --loves--> A |
| ``` |
| |
| There are two options to prevent getting stuck in a loop: |
| * `graph_depth` option - this will limit the graph depth which will be traversed from a given entity. |
| * And/Or Marking nodes and edges as 'visited'. This requires a place to store this state. See Using Redis in the next section |
| |
| # Using Redis |
| |
| Redis can be used for the following: |
| |
| If using Redis, version 2.8+ is needed because TTL is used with the 'ex' parameter. |
| |
| * Keeping track of the modified date for each entity. When running the script subsequent times after this, entiites which were not modified will not be copied. |
| * Keeping track of visited nodes for migrating a graph. This is done with a TTL such that a job can be resumed, but since there is no modified date on an edge you cannot know if there are new edges or not. Therefore, when the TTL expires the nodes will be visited again |
| * Keeping track of the URLs for the connections which are created between entities. This has no TTL. Subsequent runs will not create connections which are found in Redis which have already been created. |
| |
| |
| # Mapping |
| Using this script it is not necessary to keep the same application name, org name and/or collection name as the source at the target. For example, you could migrate from /myOrg/myApp/myCollection to /org123/app456/collections789. |
| |
| |
| # Configuration Files |
| Example source/target configuration files: |
| |
| ``` |
| { |
| "endpoint": { |
| "api_url": "https://api.usergrid.com" |
| }, |
| "credentials": { |
| "myOrg1": { |
| "client_id": "YXA6lpg9sEaaEeONxG0g3Uz44Q", |
| "client_secret": "ZdF66u2h3Hc7csOcsEtgewmxalB1Ygg" |
| }, |
| "myOrg2": { |
| "client_id": "ZXf63p239sDaaEeONSG0g3Uz44Z", |
| "client_secret": "ZdF66u2h3Hc7csOcsEtgewmxajsadfj32" |
| } |
| } |
| } |
| ``` |
| * api_url: the API URL to access/write data |
| * Credentials: |
| * For each org, with the org name (case-sensetive) as the key: |
| * client_id - the org-level Client ID. This can be retrieved from the BaaS/Usergrid Portal. |
| * client_secret - the org-level Client Secret. This can be retrieved from the BaaS/Usergrid Portal. |
| |
| # Command Line Parameters |
| |
| ``` |
| Usergrid Org/App Data Migrator |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| --log_dir LOG_DIR path to the place where logs will be written |
| --log_level LOG_LEVEL |
| log level - DEBUG, INFO, WARN, ERROR, CRITICAL |
| -o ORG, --org ORG Name of the org to migrate |
| -a APP, --app APP Name of one or more apps to include, specify none to |
| include all apps |
| -e INCLUDE_EDGE, --include_edge INCLUDE_EDGE |
| Name of one or more edges/connection types to INCLUDE, |
| specify none to include all edges |
| --exclude_edge EXCLUDE_EDGE |
| Name of one or more edges/connection types to EXCLUDE, |
| specify none to include all edges |
| --exclude_collection EXCLUDE_COLLECTION |
| Name of one or more collections to EXCLUDE, specify |
| none to include all collections |
| -c COLLECTION, --collection COLLECTION |
| Name of one or more collections to include, specify |
| none to include all collections |
| --use_name_for_collection USE_NAME_FOR_COLLECTION |
| Name of one or more collections to use [name] instead |
| of [uuid] for creating entities and edges |
| -m {data,none,reput,credentials,graph}, --migrate {data,none,reput,credentials,graph} |
| Specifies what to migrate: data, connections, |
| credentials, audit or none (just iterate the |
| apps/collections) |
| -s SOURCE_CONFIG, --source_config SOURCE_CONFIG |
| The path to the source endpoint/org configuration file |
| -d TARGET_CONFIG, --target_config TARGET_CONFIG |
| The path to the target endpoint/org configuration file |
| --limit LIMIT The number of entities to return per query request |
| -w ENTITY_WORKERS, --entity_workers ENTITY_WORKERS |
| The number of worker processes to do the migration |
| --visit_cache_ttl VISIT_CACHE_TTL |
| The TTL of the cache of visiting nodes in the graph |
| for connections |
| --error_retry_sleep ERROR_RETRY_SLEEP |
| The number of seconds to wait between retrieving after |
| an error |
| --page_sleep_time PAGE_SLEEP_TIME |
| The number of seconds to wait between retrieving pages |
| from the UsergridQueryIterator |
| --entity_sleep_time ENTITY_SLEEP_TIME |
| The number of seconds to wait between retrieving pages |
| from the UsergridQueryIterator |
| --collection_workers COLLECTION_WORKERS |
| The number of worker processes to do the migration |
| --queue_size_max QUEUE_SIZE_MAX |
| The max size of entities to allow in the queue |
| --graph_depth GRAPH_DEPTH |
| The graph depth to traverse to copy |
| --queue_watermark_high QUEUE_WATERMARK_HIGH |
| The point at which publishing to the queue will PAUSE |
| until it is at or below low watermark |
| --min_modified MIN_MODIFIED |
| Break when encountering a modified date before this, |
| per collection |
| --max_modified MAX_MODIFIED |
| Break when encountering a modified date after this, |
| per collection |
| --queue_watermark_low QUEUE_WATERMARK_LOW |
| The point at which publishing to the queue will RESUME |
| after it has reached the high watermark |
| --ql QL The QL to use in the filter for reading data from |
| collections |
| --skip_cache_read Skip reading the cache (modified timestamps and graph |
| edges) |
| --skip_cache_write Skip updating the cache with modified timestamps of |
| entities and graph edges |
| --create_apps Create apps at the target if they do not exist |
| --nohup specifies not to use stdout for logging |
| --graph Use GRAPH instead of Query |
| --su_username SU_USERNAME |
| Superuser username |
| --su_password SU_PASSWORD |
| Superuser Password |
| --inbound_connections |
| Name of the org to migrate |
| --map_app MAP_APP Multiple allowed: A colon-separated string such as |
| 'apples:oranges' which indicates to put data from the |
| app named 'apples' from the source endpoint into app |
| named 'oranges' in the target endpoint |
| --map_collection MAP_COLLECTION |
| One or more colon-separated string such as 'cats:dogs' |
| which indicates to put data from collections named |
| 'cats' from the source endpoint into a collection |
| named 'dogs' in the target endpoint, applicable |
| globally to all apps |
| --map_org MAP_ORG One or more colon-separated strings such as 'red:blue' |
| which indicates to put data from org named 'red' from |
| the source endpoint into a collection named 'blue' in |
| the target endpoint |
| ``` |
| |
| ## Example Command Line |
| |
| Use the following command to migrate DATA AND GRAPH (no graph edges or connections between entities). If there are no graph edges (connections) then using `-m graph` is not necessary. This will copy all data from all apps in the org 'myorg', creating apps in the target org if they do not already exist. Note that --create_apps will be required if the Apps in the target org have not been created. |
| |
| ``` |
| $ usergrid_data_migrator -o myorg -m graph -w 4 -s mySourceConfig.json -d myTargetConfiguration.json --create_apps |
| ``` |
| |
| Use the following command to migrate DATA ONLY (no graph edges or connections between entities). This will copy all data from all apps in the org 'myorg', creating apps in the target org if they do not already exist. Note that --create_apps will be required if the Apps in the target org have not been created. |
| |
| ``` |
| $ usergrid_data_migrator -o myorg -m data -w 4 -s mySourceConfig.json -d myTargetConfiguration.json --create_apps |
| ``` |
| |
| Use the following command to migrate CREDENTIALS for Application-level Users. Note that `usergrid.sysadmin.login.allowed=true` must be set in the `usergrid-deployment.properties` file on the source and target Tomcat nodes. |
| |
| ``` |
| $ usergrid_data_migrator -o myorg -m credentails -w 4 -s mySourceConfig.json -d myTargetConfiguration.json --create_apps --su_username foo --su_password bar |
| ``` |
| |
| This command: |
| |
| ``` |
| $ usergrid_data_migrator -o myorg -a app1 -a app2 -m data -w 4 --map_app app1:appplication_1 --map_app app2:application_2 --map_collection pets:animals --map_org myorg:my_new_org -s mySourceConfig.json -d myTargetConfiguration.json |
| ``` |
| will do the following: |
| |
| * migrate Apps named 'app1' and 'app2' in org named 'myorg' from the API endpoint defined in 'mySourceConfig.json' to the API endpoint defined in 'myTargetConfiguration.json' |
| * In the process: |
| ** data from 'myorg' will ge migrated to the org named 'my_new_org' |
| ** data from 'app1' will be migrated to the app named 'application_1' |
| ** data from 'app2' will be migrated to the app named 'application_2' |
| ** all collections named 'pets' will be overridden at the destination to 'animals' |
| |
| |
| # FAQ |
| |
| ### Does the process keep the same UUIDs? |
| |
| * Yes - with this script the same UUIDs can be kept from the source into the destination. An exception is if you specify going from one collection to another under the same Org hierarchy. |
| |
| ### Does the process keep the ordering of connections by time? |
| |
| * Yes ordering of connections is maintained in the process. |