This document is intended for any developer or sysadmin who wants to migrate their existing OA data from Spot 0.9 to Spot 1.0. In Spot 0.9, OA data was stored in CSV files in a given location in the server used for OA (specified in spot.conf during original installation). In Spot 1.0, OA data is stored in Impala tables. The purposes of these scripts are to migrate independently each use case (flow, proxy and dns) from those CSV files into the new Impala tables.
This migration process is optional and only for those users who want to keep usable their OA data (scores, edges, chords, dendros, ingest summaries, storyboards, threat investigations and timelines) generated in the previous Spot version.
Flow
CSV File | Impala Table |
---|---|
flow_scores.csv | flow_scores |
chord-*.tsv | flow_chords |
edge-*.tsv | flow_edge |
is_*.csv | flow_ingest_summary |
threats.csv | flow_storyboard |
flow_scores.csv (only scored values) | flow_threat_investigation |
sbdet-*.tsv | flow_timeline |
DNS
CSV File | Impala Table |
---|---|
flow_scores.csv | dns_scores |
edge-*.csv | dns_edge |
dendro-*.csv | dns_dendro |
threat-dendro-*.csv | dns_threat_dendro |
is_*.csv | dns_ingest_summary |
threats.csv | dns_storyboard |
dns_scores.csv (only scored values) | dns_threat_investigation |
Proxy
CSV File | Impala Table |
---|---|
edge-*.csv | proxy_edge |
is_*.csv | proxy_ingest_summary |
proxy_scores.csv | proxy_scores |
threats.csv | proxy_storyboard |
proxy_scores.csv (only scored values) | proxy_threat_investigation |
timeline-*.csv | proxy_timeline |
There is a launch and single script that will migrate all specified pipelines. This process will read each of the CSV from the existing location and import data to Impala tables accordingly, creating first a staging database and tables to load the records in the CSV and then insert that data into the new Spot 1.0 tables. You must execute this migration process from the server where Spot 0.9 CSV files are located. You may provide one pipeline or all (flow, dns and proxy) according to your needs and your existing data. At the end of each script, the old data pipeline folder will be moved from the original location to a backup folder. Staging tables and their respective HDFS paths will be removed.
./migrate_spot_0_9_to_1_0.py PIPELINES OLD_OA_PATH STAGING_DB_NAME STAGING_DB_HDFS_PATH NEW_SPOT_IMPALA_DB IMPALA_DAEMON
where variables mean:
Example:
./migrate_spot_0_9_to_1_0.py 'flow,dns,proxy' '/home/spotuser/incubator-spot_old/spot-oa' 'spot_migration' '/user/spotuser/spot_migration/' 'migrated' 'node01'