commit | dd28a78f4967bb5706cfc2e2b4b8a2cee15dd351 | [log] [tgz] |
---|---|---|
author | Mike Walch <mwalch@gmail.com> | Wed Sep 16 11:12:06 2015 -0400 |
committer | Mike Walch <mwalch@gmail.com> | Wed Sep 16 11:12:06 2015 -0400 |
tree | 2f12bfe5578e546b768d63caeefd9f135bb14dfa | |
parent | 9fa46c413a928b337fb019dbad8abf1c9fbb7b1c [diff] | |
parent | 8245084544f9e6b94eb760a1e1b3c974a73838e0 [diff] |
Merge pull request #5 from keith-turner/fix-export-queue Fixed export queue and generalized Fluo setup
An example Fluo applications that creates a web index using CommonCrawl data.
In order run this application you need the following installed and running on your machine:
Consider using fluo-dev to run these requirments
First, you must create data.yml
and dropwizard.yml
files and edit them for your environment:
cd conf cp data.yml.example data.yml cp dropwizard.yml.example dropwizard.yml
Next, run the following command to download CommonCrawl data files. The data files can have a fileType
of wat
, wet
, or warc
. The command downloads a file containing the URL path of thousands of files and numFiles
specifies how many of those files will be downloaded from AWS and loaded into your HDFS instance.
# Command structure ./bin/download.sh <fileType> <numFiles> # Use command below for this example ./bin/download.sh wat 1
Next, run the following command to run Fluo and initialize it and Accumulo with data:
./bin/init.sh
Finally, run the following command to run the web app:
./bin/webapp.sh
Open your browser to http://localhost:8080/