Before procceeding to the actual process, be sure that you are familiar with deploying a server and then getting it started. You can find information about these steps in the official VXQuery website. It would, also, be good to verify that you can ssh in different nodes without verifying the password since the scripts require to ssh from the current node to different ones.
The NOAA has hosted DAILY GLOBAL HISTORICAL CLIMATOLOGY NETWORK (GHCN-DAILY) .dat files. Weather.gov has an RSS/XML feed that gives current weather sensor readings. Using the RSS feed as a template, the GHCN-DAILY historical information is used to generate past RSS feed XML documents. The process allows testing on a large set of information with out having to continually monitor the weather.gov site for all the weather details for years.
Detailed GHDN-DAILY information: http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt
The process takes a save folder for the data. The folder contains a several folders:
To convert the weather data to XML, 4 stages have to be completed. The stages are described below:
After the convertion is completed, the system has to be setup to execute some queries to evaluate its performance. The steps for this procedure are described below:
Downloading python weather_cli.py -l download -x weather_example.xml
Building python weather_cli.py -l sensor_build -x weather_example.xml (-l station_build for the station data)
Partitioning python weather_cli.py -l partition -x weather_example.xml
Linking python weather_cli.py -l test_links -x weather_example.xml
Building queries python weather_cli.py -l queries -x weather_example.xml
Executing queries run_group_test.sh cluster_ip path/to/weather_folder