layout: docs title: Air Flight Example permalink: /docs/flight-example/

We can use some airplane flight information as a example to show some basic functionality that we can provide with this Accumulo and Pig support. The American Statistical Association has a nice collection of data sets regarding flights from 1987 through 2008. We can use the data set of flights and the data set about airports to easily join some records in a few lines of Pig.

Writing the data

Download at least one year of the flight data, decompress it and place it into HDFS. From this point, we can write a few lines of Pig to read the file using PigStorage, create a unique row key and configure AccumuloStorage to use the given column names to write the data into Accumulo. We need to make sure we include the accumulo-pig jar which has the AccumuloStorage implementation we're leveraging, in addition to the Accumulo, Thrift and ZooKeeper dependencies.

This will load a years worth of data into Accumulo with a rowkey that is the concatenation of the year, month and day of the flight, the carrier code, and the flight number, which should give us good distribution and parallelism inside of Accumulo.

Next, we want to do the same for the airport information:

At this point, we now have flight information in the ‘flight_data’ Accumulo table and airport information in the ‘airports’ Accumulo table. We can project our flight data down to just departure flight information and join this information about the origin airport code with the actual airport information.

Opening the Accumulo shell to view the data which we have just written, we can see some sample records key values: