CLI helper tool(s) for Ambari Infra Solr.
Ambari Infra Solr uses Solr 7 from Ambari 2.7.0, therefore it is required migrate Solr 5 index (Ambari Infra 2.6.x), if you want to keep your old data. (otherwise backup part can be skipped)
First make sure ambari-infra-solr-client
is the latest. (If its before 2.7.x) It will contain the migrationHelper.py script at /usr/lib/ambari-infra-solr-client
location. Also make sure you won't upgrade ambari-infra-solr
until the migration has not done. (all of this should happen after ambari-server
upgrade, also make sure to not restart INFRA_SOLR
instances). You will need to stop ranger plugins at this point. (not mandatory, but recommended before backup, as you want to backup the same data that you had before)
Optionally if you are done with the second step, you can send an upgrade command to the installed ambari-infra-solr-client components on the cluster (it's basically run a pacakge remove and package install)
CONFIG_INI_LOCATION=ambari_solr_migration.ini /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper --ini-file $CONFIG_INI_LOCATION --action upgrade-solr-clients
At the start, it is required to create a proper configuration input for the migration helper script. That can be done with /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py script. Choose one of the Solr server host, and ssh there and run (with proper ambari-server configurations as flags):
# use a sudoer user for running the script !! CONFIG_INI_LOCATION=ambari_solr_migration.ini # output of the script with required parameters for migrationHelper.py # note 1: use -s if ambari-server uses https # note 2: use --shared-driver if the backup location is shared for different hosts # note 3: use --hdfs-base-path if the index data is located on hdfs (or --ranger-hdfs-base-path if only ranger collection is located there), e.g.: /user/infra-solr /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host c7401.ambari.apache.org --port 8080 --cluster cl1 --username admin --password admin --backup-base-path=/my/path --java-home /usr/jdk64/jdk1.8.0_112
Some important flags that can be added at this point;
--shared-drive
: Use this flag if the location of the backup is shared between hosts (it will generate the index location as <index_location>, therefore migration commands can be parallel on different hosts)--backup-base-path
: base path of the backup. e.g. if you provide /my/path
, the backup locations will be /my/path/ranger
and /my/path/atlas
, if the base path won't be the same for these, you can provie Ranger or Atlas specific ones with --ranger-backup-base-path
and --atlas-backup-base-path
--hdfs-base-path
: use this if index is stored hdfs, that is applied for all index, most of the time that is only used for ranger, so if that is the case ose --ranger-hdfs-base-path
instead of this option, the value is mostly /user/infra-solr
which means the collection itself could be at hdfs:///user/infra-solr/ranger_audts
locationThe generated config file output could be something like that:
[ambari_server] host = c7401.ambari.apache.org port = 8080 cluster = cl1 protocol = http username = admin password = admin [local] java_home = /usr/jdk64/jdk1.8.0_112/ hostname = c7402.ambari.apache.org shared_drive = false [cluster] kerberos_enabled = true [infra_solr] protocol = http hosts = c7402.ambari.apache.org,c7403.ambari.apache.org zk_connect_string = c7401.ambari.apache.org:2181 znode = /infra-solr user = infra-solr keytab = /etc/security/keytabs/ambari-infra-solr.service.keytab principal = infra-solr/c7402.ambari.apache.org zk_principal_user = zookeeper [ranger_collection] enabled = true ranger_config_set_name = ranger_audits ranger_collection_name = ranger_audits ranger_collection_shards = 2 ranger_collection_max_shards_per_node = 4 backup_ranger_config_set_name = old_ranger_audits backup_ranger_collection_name = old_ranger_audits backup_path = /my/path/ranger [atlas_collections] enabled = true config_set = atlas_configs fulltext_index_name = fulltext_index fulltext_index_shards = 2 fulltext_index_max_shards_per_node = 4 edge_index_name = edge_index edge_index_shards = 2 edge_index_max_shards_per_node = 4 vertex_index_name = vertex_index vertex_index_shards = 2 vertex_index_max_shards_per_node = 4 backup_fulltext_index_name = old_fulltext_index backup_edge_index_name = old_edge_index backup_vertex_index_name = old_vertex_index backup_path = /my/path/atlas [logsearch_collections] enabled = true hadoop_logs_collection_name = hadoop_logs audit_logs_collection_name = audit_logs history_collection_name = history
After the file has created successfully by the script, review the configuration (e.g.: if 1 of the Solr is not up yet, and you do not want to use its REST API for operations, you can remove its host from the hosts of infra_solr section or you can change backup locations for different collections etc.). Also if it's not required to backup e.g. Atlas collections (so you are ok to drop those), you can change the enabled
config of the collections section to false.``
Before you start to upgrade process check the Solr instances are running and also make sure you have stable shards (at least one core is up and running) and will have enough space on the disks to store Solr backup data. (you will need at least that many as your index size per host). The backup process contains a few steps: backup ranger configs on znode, backup collections, delete Log Search znodes, then upgrade managed-schema
znode for Ranger. These tasks can be done with 1 migrationHelper.py command:
# use a sudoer user for running the script !! # first (optionally) you can check that there are any ACTIVE relplicas for all the shards /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action check-shards # then run backup-and-cleanup ... you can run these actions separately with these action: 'backup','delete-collections', 'cleanup-znodes' /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup-and-cleanup
If the script finished successfully and everything looks green on Ambari UI as well, you can go ahead with Infra Solr package upgrade. Otherwise (or if you want to go step by step instead of the command above) you have to option to run tasks step by step (or manually as well). Those tasks are found in the next sections.
The migrationHelper.py script can be used to backup only Ranger collection (use -s
option to filter on services)
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup -s RANGER
Also you can do the backup manually on every Solr node, by using backup API of Solr. (use against core names, not collection name, it works as expected only if you have 1 shard on every node)
Example:
su infra-solr SOLR_URL=... # actual solr host url, example: http://c6401.ambari.apache.org:8886/solr # collection parameters BACKUP_PATH=... # backup location, e.g.: /tmp/ranger-backup # RUN THIS FOR EVERY CORE ON SPECIFIC HOSTS !!! BACKUP_CORE=... # specific core on a host BACKUP_CORE_NAME=... # core names for backup -> <backup_location>/ kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) mkdir -p $BACKUP_PATH curl --negotiate -k -u : "$SOLR_URL/$BACKUP_CORE/replication?command=BACKUP&location=$BACKUP_PATH&name=$BACKUP_CORE_NAME"
(help: get core names)
Next you can copy ranger_audits
configs to a different znode, in order to keep the old schema.
export JAVA_HOME=/usr/jdk64/1.8.0_112 # or other jdk8 location export ZK_CONN_STR=... # without znode, e.g.: myhost1:2181,myhost2:2181,myhost3:2181 # note 1: --transfer-mode copyToLocal or --transfer-mode copyFromLocal can be used if you want to use the local filesystem # note 2: use --jaas-file option only if the cluster is kerberized infra-solr-cloud-cli --transfer-znode -z $ZK_CONN_STR --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --copy-src /infra-solr/configs/ranger_audits --copy-dest /infra-solr/configs/old_ranger_audits
At this point you can delete the actual Ranger collection with this command:
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action delete-collections -s RANGER
Or do it manually by the Solr API:
su infra-solr # infra-solr user - if you have a custom one, use that SOLR_URL=... # example: http://c6401.ambari.apache.org:8886/solr COLLECTION_NAME=ranger_audits # use kinit and --negotiate option for curl only if the cluster is kerberized kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=DELETE&name=$COLLECTION_NAME"
Before creating the new Ranger collection, it is required to upgrade managed-schema
configs.
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action cleanup-znodes -s RANGER
It can be done manually by infra-solr-cloud-cli
as well:
sudo -u infra-solr -i # If kerberos enabled kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) ## BACKUP OLD CONFIG export JAVA_HOME=/usr/jdk64/1.8.0_112 # or other jdk8 location export ZK_CONN_STR=... # without znode, e.g.: myhost1:2181,myhost2:2181,myhost3:2181 # note: --transfer-mode copyToLocal or --transfer-mode copyFromLocal can be used if you want to use the local filesystem infra-solr-cloud-cli --transfer-znode -z $ZK_CONN_STR --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --copy-src /infra-solr/configs/ranger_audits --copy-dest /infra-solr/configs/old_ranger_audits ## UPLOAD NEW SCHEMA # Setup env for zkcli.sh source /etc/ambari-infra-solr/conf/infra-solr-env.sh # Run that command only if kerberos is enabled. export SOLR_ZK_CREDS_AND_ACLS="${SOLR_AUTHENTICATION_OPTS}" # Upload the new schema /usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh --zkhost "${ZK_HOST}" -cmd putfile /configs/ranger_audits/managed-schema /usr/lib/ambari-infra-solr-client/migrate/managed-schema
Atlas has 3 collections: fulltext_index, edge_index, vertex_index. You will need to do similar steps that you did for Ranger, only difference is you will need to filter ATLAS service.
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup -s ATLAS
Also you can do the backup manually on every Solr node, by using backup API of Solr. (use against core names, not collection name, it works as expected only if you have 1 shard on every node)
Example:
su infra-solr SOLR_URL=... # actual solr host url, example: http://c6401.ambari.apache.org:8886/solr # collection parameters BACKUP_PATH=... # backup location, e.g.: /tmp/fulltext_index_backup # RUN THIS FOR EVERY CORE ON SPECIFIC HOSTS !!! BACKUP_CORE=... # specific core on a host BACKUP_CORE_NAME=... # core names for backup -> <backup_location>/ kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) mkdir -p $BACKUP_PATH curl --negotiate -k -u : "$SOLR_URL/$BACKUP_CORE/replication?command=BACKUP&location=$BACKUP_PATH&name=$BACKUP_CORE_NAME"
(help: get core names)
Next step for Atlas is to delete all 3 old collections. It can be done by delete-collections
action with ATLAS filter.
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action delete-collections -s ATLAS
Or manually run DELETE operation with 3 Solr API call on all 3 Atlas collections:
su infra-solr # infra-solr user - if you have a custom one, use that SOLR_URL=... # example: http://c6401.ambari.apache.org:8886/solr # use kinit and --negotiate option for curl only if the cluster is kerberized kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) COLLECTION_NAME=fulltext_index curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=DELETE&name=$COLLECTION_NAME" COLLECTION_NAME=edge_index curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=DELETE&name=$COLLECTION_NAME" COLLECTION_NAME=vertex_index curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=DELETE&name=$COLLECTION_NAME"
For Log Search, it is a must to delete all the old collections. Can be done similar way as for Ranger or Atlas:
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action delete-collections -s LOGSEARCH
Or manually run Solr API DELETE commands here as well:
su infra-solr # infra-solr user - if you have a custom one, use that SOLR_URL=... # example: http://c6401.ambari.apache.org:8886/solr # use kinit and --negotiate option for curl only if the cluster is kerberized kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) COLLECTION_NAME=hadoop_logs curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=DELETE&name=$COLLECTION_NAME" COLLECTION_NAME=audit_logs curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=DELETE&name=$COLLECTION_NAME" COLLECTION_NAME=history curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=DELETE&name=$COLLECTION_NAME"
Log Search configs are changed a lot between Ambari 2.6.x and Ambari 2.7.x, so it is required to delete those as well. (configs will be regenerated during Log Search startup)
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action cleanup-znodes -s LOGSEARCH
You can delete the znodes by zookeeper-client as well:
su infra-solr # infra-solr user - if you have a custom one, use that # ZOOKEEPER CONNECTION STRING from zookeeper servers export ZK_CONN_STR=... # without znode,e.g.: myhost1:2181,myhost2:2181,myhost3:2181 kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) zookeeper-client -server $ZK_CONN_STR rmr /infra-solr/configs/hadoop_logs zookeeper-client -server $ZK_CONN_STR rmr /infra-solr/configs/audit_logs zookeeper-client -server $ZK_CONN_STR rmr /infra-solr/configs/history
At this step, you will need to upgrade ambari-infra-solr
packages. (also make sure ambari-logsearch* packages are upgraded as well)
Example (for CentOS):
yum upgrade -y ambari-infra-solr
Or optionally you can do that through ambari commands with the migrationHelper.py script (that means you wont need to ssh into every Infra Solr instance host):
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action upgrade-solr-instances # same can be done for logfeeders and logsearch portals if required: # just use '--action upgrade-logsearch-portal' or '--action upgrade-logfeeders'
That runs a package remove and a package install.
Restart Ranger Admin / Atlas / Log Search Ambari service, as the collections were deleted before, during startup, new collections will be created (as a Solr 7 collection). At this point you can stop, and do the migration / restore later (until you will have the backup), and go ahead with e.g. HDP upgrade. (migration part can take long - 1GB/min.)
From this point, you can migrate your old index in the background. On every hosts, where there is a backup located, you can run luce index migration tool (packaged with ambari-infra-solr-client).. For lucene index migration, migrationHelper.py can be used, or /usr/lib/ambari-infra-solr-client/solrIndexHelper.sh
directly. That script uses IndexMigrationTool The whole migration can be done with execuing 1 command;
# use a sudoer user for running the script !! # you can use this command with nohup in the background, like: `nohup <command> > nohup2.out&`, as migration can take so much time (~1GB/min) /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action migrate
If the script finished successfully and everything looks green on Ambari UI as well, you can go ahead with Restore collections. Otherwise (or if you want to go step by step instead of the command above) you have to option to run tasks step by step (or manually as well). Those tasks are found in the next sections.
Migration for ranger_audits
collection (cores):
# by efault, you will mirate to Lucene 6.6.2, if you want to migrate again to Solr 7 (not requred), you can use --version 7.3.1 flag /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action migrate -s RANGER
Or you can run commands manually on nodes where your backups are located:
export JAVA_HOME=/usr/jdk64/1.8.0_112 # if /tmp/ranger-backup is your backup location infra-lucene-index-tool upgrade-index -d /tmp/ranger-backup -f -b -g # with 'infra-lucene-index-tool help' command you can checkout the command line options
By default, the tool will migrate from lucene version 5 to lucene version 6.6.2. (that's ok for Solr 7) If you want a lucene 7 index, you will need to re-run the migration tool command with -v 7.3.1
option.
As Atlas has 3 collections, you will need similar steps that is required for Ranger, just for all 3 collections. (fulltext_index, edge_index, vertex_index)
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action migrate -s ATLAS
Or you can run commands manually on nodes where your backups are located:
export JAVA_HOME=/usr/jdk64/1.8.0_112 # if /tmp/fulltext_index_backup is your backup location infra-lucene-index-tool upgrade-index -d /tmp/fulltext_index_backup -f -b -g # with 'infra-lucene-index-tool help' command you can checkout the command line options
By default, the tool will migrate from lucene version 5 to lucene version 6.6.2. (that's ok for Solr 7) If you want a lucene 7 index, you will need to re-run the migration tool command with -v 7.3.1
option.
For restoring the old collections, first you will need to create them. As those collections could be not listed in the security.json of Infra Solr, you can get 403 errors if you will try to access those collections later, for that time until you are doing the restoring + transport solr data to another collections, you can trun off the Solr authorization plugin.
The collection creation and restore part can be done with 1 command:
# use a sudoer user for running the script !! /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restore --keep-backup
If the script finished successfully and everything looks green on Ambari UI as well, you can go ahead with Restart Solr Instances. Otherwise (or if you want to go step by step instead of the command above) you have to option to run tasks step by step (or manually as well). Those tasks are found in the next sections.
After lucene data migration is finished, you can restore your replicas on every hosts where you have the backups. But we need to restore the old data to a new collection, so first you will need to create that: (on a host where you have an installed Infra Solr component). For Ranger, use old_ranger_audits config set that you backup up during Solr schema config upgrade step. (set this as CONFIG_NAME), to make that collection to work with Solr 7, you need to copy your solrconfig.xml as well. That can be done with executing the following command:
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restore -s RANGER
Or you can manually create a collection for restoring the backup (old_ranger_audits
)
su infra-solr # infra-solr user - if you have a custom one, use that SOLR_URL=... # example: http://c6401.ambari.apache.org:8886/solr NUM_SHARDS=... # use that number that was used for the old collection - important to use at least that many that you have originally before backup NUM_REP=1 # can be more, but 1 is recommended for that temp collection MAX_SHARDS_PER_NODE=... # use that number that was used for the old collection CONFIG_NAME=old_ranger_audits OLD_DATA_COLLECTION=old_ranger_audits # kinit only if kerberos is enabled for tha cluster kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) export JAVA_HOME=/usr/jdk64/1.8.0_112 # or other jdk8 location export ZK_CONN_STR=... # without znode, e.g.: myhost1:2181,myhost2:2181,myhost3:2181 # note 1: jaas-file option required only if kerberos is enabled for the cluster # note 2: copy new solrconfig.xml as the old one won't be compatible with solr 7 infra-solr-cloud-cli --transfer-znode -z $ZK_CONN_STR --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --copy-src /infra-solr/configs/ranger_audits/solrconfig.xml --copy-dest /infra-solr/configs/old_ranger_audits/solrconfig.xml curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=CREATE&name=$OLD_DATA_COLLECTION&numShards=$NUM_SHARDS&replicationFactor=$NUM_REP&maxShardsPerNode=$MAX_SHARDS_PER_NODE&collection.configName=$CONFIG_NAME"
Then restore the cores with Solr REST API: (get core names)
su infra-solr SOLR_URL=... # actual solr host url, example: http://c6401.ambari.apache.org:8886/solr BACKUP_PATH=... # backup location, e.g.: /tmp/ranger-backup OLD_BACKUP_COLLECTION_CORE=... # choose a core to restore BACKUP_CORE_NAME=... # choose a core from backup cores - you can find these names as : <backup_location>/snapshot.$BACKUP_CORE_NAME kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) curl --negotiate -k -u : "$SOLR_URL/$OLD_BACKUP_COLLECTION_CORE/replication?command=RESTORE&location=$BACKUP_PATH&name=$BACKUP_CORE_NAME"
Or use simple cp
or hdfs dfs -put
commands to copy the migrated cores to the right places.
For Atlas, use old_
prefix for all 3 collections that you need to create and use atlas_configs
config set, then use those for restore the backups;
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restore -s ATLAS
Or you can do the create collection and restore collections (cores) step by step:
Create a collection for restoring the backup (old_fulltext_index
, old_vertex_index
, old_edge_index
)
su infra-solr # infra-solr user - if you have a custom one, use that SOLR_URL=... # example: http://c6401.ambari.apache.org:8886/solr NUM_SHARDS=... # use that number that was used for the old collection - important to use at least that many that you have originally before backup NUM_REP=1 # use 1! MAX_SHARDS_PER_NODE=... # use that number that was used for the old collection CONFIG_NAME=atlas_configs # kinit only if kerberos is enabled for tha cluster kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) OLD_DATA_COLLECTION=old_fulltext_index curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=CREATE&name=$OLD_DATA_COLLECTION&numShards=$NUM_SHARDS&replicationFactor=$NUM_REP&maxShardsPerNode=$MAX_SHARDS_PER_NODE&collection.configName=$CONFIG_NAME" OLD_DATA_COLLECTION=old_edge_index curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=CREATE&name=$OLD_DATA_COLLECTION&numShards=$NUM_SHARDS&replicationFactor=$NUM_REP&maxShardsPerNode=$MAX_SHARDS_PER_NODE&collection.configName=$CONFIG_NAME" OLD_DATA_COLLECTION=old_vertex_index curl --negotiate -k -u : "$SOLR_URL/admin/collections?action=CREATE&name=$OLD_DATA_COLLECTION&numShards=$NUM_SHARDS&replicationFactor=$NUM_REP&maxShardsPerNode=$MAX_SHARDS_PER_NODE&collection.configName=$CONFIG_NAME"
Also you can manually run restore commands: (get core names)
su infra-solr SOLR_URL=... # actual solr host url, example: http://c6401.ambari.apache.org:8886/solr BACKUP_PATH=... # backup location, e.g.: /tmp/fulltext_index-backup OLD_BACKUP_COLLECTION_CORE=... # choose a core to restore BACKUP_CORE_NAME=... # choose a core from backup cores - you can find these names as : <backup_location>/snapshot.$BACKUP_CORE_NAME kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) curl --negotiate -k -u : "$SOLR_URL/$OLD_BACKUP_COLLECTION_CORE/replication?command=RESTORE&location=$BACKUP_PATH&name=$BACKUP_CORE_NAME"
Or use simple cp
or hdfs dfs -put
commands to copy the migrated cores to the right places.
Next step is to restart Solr instances. That can be done on the Ambari UI, or optionally you can use the migrationHelper script for that as well (rolling restart)
# --batch-interval -> interval between restart solr tasks /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action rolling-restart-solr --batch-interval 60
Last step (that can be done any time, as you already have your data in Solr) is to transport all data from the backup collections to the live ones.
In the end, you end up with 2 collections (ranger_audits and old_ranger_audits), in order to drop the restored one, you will need to transfer your old data to the new collection. To achieve this, you can use solrDataManager.py, which is located next to the migrationHelper.py
script
# Init values: SOLR_URL=... # example: http://c6401.ambari.apache.org:8886/solr INFRA_SOLR_KEYTAB=... # example: /etc/security/keytabs/ambari-infra-solr.service.keytab INFRA_SOLR_PRINCIPAL=... # example: infra-solr/$(hostname -f)@EXAMPLE.COM END_DATE=... # example: 2018-02-18T12:00:00.000Z , date until you export data OLD_COLLECTION=old_ranger_audits ACTIVE_COLLECTION=ranger_audits EXCLUDE_FIELDS=_version_ # comma separated exclude fields, at least _version_ is required DATE_FIELD=evtTime # infra-solr-data-manager is a symlink points to /usr/lib/ambari-infra-solr-client/solrDataManager.py infra-solr-data-manager -m archive -v -c $OLD_COLLECTION -s $SOLR_URL -z none -r 10000 -w 100000 -f $DATE_FIELD -e $END_DATE --solr-output-collection $ACTIVE_COLLECTION -k $INFRA_SOLR_KEYTAB -n $INFRA_SOLR_PRINCIPAL --exclude-fields $EXCLUDE_FIELDS # Or if you want to run the command in the background (with log and pid file): nohup infra-solr-data-manager -m archive -v -c $OLD_COLLECTION -s $SOLR_URL -z none -r 10000 -w 100000 -f $DATE_FIELD -e $END_DATE --solr-output-collection $ACTIVE_COLLECTION -k $INFRA_SOLR_KEYTAB -n $INFRA_SOLR_PRINCIPAL --exclude-fields $EXCLUDE_FIELDS > /tmp/solr-data-mgr.log 2>&1>& echo $! > /tmp/solr-data-mgr.pid
In the end, you end up with 6 Atlas collections (vertex_index, old_vertex_index, edge_index, old_edge_index, fulltext_index, old_fulltext_index), in order to drop the restored one, you will need to transfer your old data to the new collection. To achieve this, you can use solrDataManager.py, which is located next to the migrationHelper.py
script
Example: (with fulltext_index, to the same with edge_index and vertex_index)
# Init values: SOLR_URL=... # example: http://c6401.ambari.apache.org:8886/solr INFRA_SOLR_KEYTAB=... # example: /etc/security/keytabs/ambari-infra-solr.service.keytab INFRA_SOLR_PRINCIPAL=... # example: infra-solr/$(hostname -f)@EXAMPLE.COM END_DATE=... # example: 2018-02-18T12:00:00.000Z , date until you export data OLD_COLLECTION=old_fulltext_index ACTIVE_COLLECTION=fulltext_index EXCLUDE_FIELDS=_version_ # comma separated exclude fields, at least _version_ is required DATE_FIELD=timestamp # infra-solr-data-manager is a symlink points to /usr/lib/ambari-infra-solr-client/solrDataManager.py infra-solr-data-manager -m archive -v -c $OLD_COLLECTION -s $SOLR_URL -z none -r 10000 -w 100000 -f $DATE_FIELD -e $END_DATE --solr-output-collection $ACTIVE_COLLECTION -k $INFRA_SOLR_KEYTAB -n $INFRA_SOLR_PRINCIPAL --exclude-fields $EXCLUDE_FIELDS # Or if you want to run the command in the background (with log and pid file): nohup infra-solr-data-manager -m archive -v -c $OLD_COLLECTION -s $SOLR_URL -z none -r 10000 -w 100000 -f $DATE_FIELD -e $END_DATE --solr-output-collection $ACTIVE_COLLECTION -k $INFRA_SOLR_KEYTAB -n $INFRA_SOLR_PRINCIPAL --exclude-fields $EXCLUDE_FIELDS > /tmp/solr-data-mgr.log 2>&1>& echo $! > /tmp/solr-data-mgr.pid
CONFIG_INI_LOCATION=ambari_migration.ini BACKUP_BASE_PATH=/tmp # if backup is required: /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host c7401.ambari.apache.org -port 8080 --cluster cl1 --username admin --password admin --backup-base-path=$BACKUP_BASE_PATH --java-home /usr/jdk64/jdk1.8.0_112 /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup-and-cleanup /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action migrate /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restore /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action rolling-restart-solr # or if backup is not required: /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host c7401.ambari.apache.org -port 8080 --cluster cl1 --username admin --password admin --backup-base-path=$BACKUP_BASE_PATH --java-home /usr/jdk64/jdk1.8.0_112 /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action delete-collections
--service-filter
or -s
: you can filter on services for migration commands (like run against only ATLAS or RANGER), possible values: ATLAS,RANGER,LOGSEARCH--skip-cores
: skip specific cores from migration (can be useful if just one of it failed during restore etc.)--collection
or -c
: run migration commands on just a specific collection (like: ranger_adits
, or old_ranger_audits
for restore)As Solr instances won't start with the new upgraded configs (only if kerberos is enabled), you can do a small fix to make it work to just add this line to infra-solr-env/content
:
SOLR_KERB_NAME_RULES="{{infra_solr_kerberos_name_rules}}"
To get which hosts are related for your collections, you can check the Solr UI (using SPNEGO), or checkout get state.json details using a zookeeper-client or Solr zookeeper api to get state.json details of the collection (/solr/admin/zookeeper?detail=true&path=/collections/<collection_name>/state.json
)
You can turn off Solr authorization plugin with setting infra-solr-security-json/content
Ambari configuration to {"authentication": {"class": "org.apache.solr.security.KerberosPlugin"}}
(with that authentication will be still enabled). Then you will need to restart Solr, as that config is uploaded to the /infra-solr/security.json
znode during startup. Other option is to use zkcli.sh of an Infra Solr to upload the security.json to the right place:
# Setup env for zkcli.sh source /etc/ambari-infra-solr/conf/infra-solr-env.sh # Run that command only if kerberos is enabled. export SOLR_ZK_CREDS_AND_ACLS="${SOLR_AUTHENTICATION_OPTS}" ZK_CONN_STRING=... # connection string -> zookeeper server addresses with the znode, e.g.: c7401.ambari.apache.org:2181/infra-solr /usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh -zkhost $ZK_CONN_STRING -cmd put /security.json '{"authentication": {"class": "org.apache.solr.security.KerberosPlugin"}}'
Or you can also use the migationHelper.py
script to disable the Solr authorization (for that to keep this settings, you can disable the management of the security.json in infra-solr-security-json
config type)
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action disable-solr-authorization
/usr/lib/ambari-infra-solr-client/migrationHelper.py --help
Usage: migrationHelper.py [options] Options: -h, --help show this help message and exit -a ACTION, --action=ACTION delete-collections | backup | cleanup-znodes | backup- and-cleanup | migrate | restore | rolling-restart-solr -i INI_FILE, --ini-file=INI_FILE Config ini file to parse (required) -f, --force force index upgrade even if it's the right version -v, --verbose use for verbose logging -s SERVICE_FILTER, --service-filter=SERVICE_FILTER run commands only selected services (comma separated: LOGSEARCH,ATLAS,RANGER) -c COLLECTION, --collection=COLLECTION selected collection to run an operation --async async Ambari operations (backup | restore | migrate) --index-location=INDEX_LOCATION location of the index backups. add ranger/atlas prefix after the path. required only if no backup path in the ini file --atlas-index-location=ATLAS_INDEX_LOCATION location of the index backups (for atlas). required only if no backup path in the ini file --ranger-index-location=RANGER_INDEX_LOCATION location of the index backups (for ranger). required only if no backup path in the ini file --version=INDEX_VERSION lucene index version for migration (6.6.2 or 7.3.1) --request-tries=REQUEST_TRIES number of tries for BACKUP/RESTORE status api calls in the request --request-time-interval=REQUEST_TIME_INTERVAL time interval between BACKUP/RESTORE status api calls in the request --request-async skip BACKUP/RESTORE status api calls from the command --include-solr-hosts=INCLUDE_SOLR_HOSTS comma separated list of included solr hosts --exclude-solr-hosts=EXCLUDE_SOLR_HOSTS comma separated list of excluded solr hosts --disable-solr-host-check Disable to check solr hosts are good for the collection backups --core-filter=CORE_FILTER core filter for replica folders --skip-cores=SKIP_CORES specific cores to skip (comma separated) --skip-generate-restore-host-cores Skip the generation of restore_host_cores.json, just read the file itself, can be useful if command failed at some point. --hdfs-base-path=HDFS_BASE_PATH hdfs base path where the collections are located (e.g.: /user/infrasolr). Use if both atlas and ranger collections are on hdfs. --ranger-hdfs-base-path=RANGER_HDFS_BASE_PATH hdfs base path where the ranger collection is located (e.g.: /user/infra-solr). Use if only ranger collection is on hdfs. --atlas-hdfs-base-path=ATLAS_HDFS_BASE_PATH hdfs base path where the atlas collections are located (e.g.: /user/infra-solr). Use if only atlas collections are on hdfs. --keep-backup If it is turned on, Snapshot Solr data will not be deleted from the filesystem during restore. --batch-interval=BATCH_INTERVAL batch time interval (seconds) between requests (for restarting INFRA SOLR, default: 60) --batch-fault-tolerance=BATCH_FAULT_TOLERANCE fault tolerance of tasks for batch request (for restarting INFRA SOLR, default: 0) --shared-drive Use if the backup location is shared between hosts. (override config from config ini file)
Usage: migrationConfigGenerator.py [options] Options: -h, --help show this help message and exit -H HOST, --host=HOST hostname for ambari server -P PORT, --port=PORT port number for ambari server -c CLUSTER, --cluster=CLUSTER name cluster -f, --force-ranger force to get Ranger details - can be useful if Ranger is configured to use external Solr (but points to internal Sols) -s, --ssl use if ambari server using https -v, --verbose use for verbose logging -u USERNAME, --username=USERNAME username for accessing ambari server -p PASSWORD, --password=PASSWORD password for accessing ambari server -j JAVA_HOME, --java-home=JAVA_HOME local java_home location -i INI_FILE, --ini-file=INI_FILE Filename of the generated ini file for migration (default: ambari_solr_migration.ini) --backup-base-path=BACKUP_BASE_PATH base path for backup, e.g.: /tmp/backup, then /tmp/backup/ranger/ and /tmp/backup/atlas/ folders will be generated --backup-ranger-base-path=BACKUP_RANGER_BASE_PATH base path for ranger backup (override backup-base-path for ranger), e.g.: /tmp/backup/ranger --backup-atlas-base-path=BACKUP_ATLAS_BASE_PATH base path for atlas backup (override backup-base-path for atlas), e.g.: /tmp/backup/atlas --hdfs-base-path=HDFS_BASE_PATH hdfs base path where the collections are located (e.g.: /user/infrasolr). Use if both atlas and ranger collections are on hdfs. --ranger-hdfs-base-path=RANGER_HDFS_BASE_PATH hdfs base path where the ranger collection is located (e.g.: /user/infra-solr). Use if only ranger collection is on hdfs. --atlas-hdfs-base-path=ATLAS_HDFS_BASE_PATH hdfs base path where the atlas collections are located (e.g.: /user/infra-solr). Use if only atlas collections are on hdfs. --skip-atlas skip to gather Atlas service details --skip-ranger skip to gather Ranger service details --retry=RETRY number of retries during accessing random solr urls --delay=DELAY delay (seconds) between retries during accessing random solr urls --shared-drive Use if the backup location is shared between hosts.
/usr/lib/ambari-infra-solr-client/solrDataManager.py --help
Usage: solrDataManager.py [options] Options: --version show program's version number and exit -h, --help show this help message and exit -m MODE, --mode=MODE archive | delete | save -s SOLR_URL, --solr-url=SOLR_URL the url of the solr server including the port -c COLLECTION, --collection=COLLECTION the name of the solr collection -f FILTER_FIELD, --filter-field=FILTER_FIELD the name of the field to filter on -r READ_BLOCK_SIZE, --read-block-size=READ_BLOCK_SIZE block size to use for reading from solr -w WRITE_BLOCK_SIZE, --write-block-size=WRITE_BLOCK_SIZE number of records in the output files -i ID_FIELD, --id-field=ID_FIELD the name of the id field -o DATE_FORMAT, --date-format=DATE_FORMAT the date format to use for --days -q ADDITIONAL_FILTER, --additional-filter=ADDITIONAL_FILTER additional solr filter -j NAME, --name=NAME name included in result files -g, --ignore-unfinished-uploading --json-file create a json file instead of line delimited json -z COMPRESSION, --compression=COMPRESSION none | tar.gz | tar.bz2 | zip | gz -k SOLR_KEYTAB, --solr-keytab=SOLR_KEYTAB the keytab for a kerberized solr -n SOLR_PRINCIPAL, --solr-principal=SOLR_PRINCIPAL the principal for a kerberized solr -a HDFS_KEYTAB, --hdfs-keytab=HDFS_KEYTAB the keytab for a kerberized hdfs -l HDFS_PRINCIPAL, --hdfs-principal=HDFS_PRINCIPAL the principal for a kerberized hdfs -u HDFS_USER, --hdfs-user=HDFS_USER the user for accessing hdfs -p HDFS_PATH, --hdfs-path=HDFS_PATH the hdfs path to upload to -t KEY_FILE_PATH, --key-file-path=KEY_FILE_PATH the file that contains S3 <accessKey>,<secretKey> -b BUCKET, --bucket=BUCKET the bucket name for S3 upload -y KEY_PREFIX, --key-prefix=KEY_PREFIX the key prefix for S3 upload -x LOCAL_PATH, --local-path=LOCAL_PATH the local path to save the files to -v, --verbose --solr-output-collection=SOLR_OUTPUT_COLLECTION target output solr collection for archive --exclude-fields=EXCLUDE_FIELDS Comma separated list of excluded fields from json response specifying the end of the range: -e END, --end=END end of the range -d DAYS, --days=DAYS number of days to keep