commit | 8b56ffd95784c6f6e08ab8641716bfebe32c457e | [log] [tgz] |
---|---|---|
author | Mike Walch <mwalch@gmail.com> | Tue Sep 29 10:21:12 2015 -0400 |
committer | Mike Walch <mwalch@gmail.com> | Tue Sep 29 10:21:12 2015 -0400 |
tree | 8957d6a5d7efbc406310bdb4836d8c40b9934295 | |
parent | dd28a78f4967bb5706cfc2e2b4b8a2cee15dd351 [diff] | |
parent | 4ebde3493049d49cc2a303e2a493dc7a00280892 [diff] |
Merge pull request #8 from mikewalch/indexing Closes #4 - Accumulo indexes are now updated using Fluo IndexExporter.
An example Fluo applications that creates a web index using CommonCrawl data.
In order run this application you need the following installed and running on your machine:
Consider using fluo-dev to run these requirments
First, you must create data.yml
and dropwizard.yml
files and edit them for your environment:
cd conf cp data.yml.example data.yml cp dropwizard.yml.example dropwizard.yml
Next, run the following command to download CommonCrawl data files. The data files can have a fileType
of wat
, wet
, or warc
. The command downloads a file containing the URL path of thousands of files and numFiles
specifies how many of those files will be downloaded from AWS and loaded into your HDFS instance.
# Command structure ./bin/download.sh <fileType> <numFiles> # Use command below for this example ./bin/download.sh wat 1
Next, run the following command to run Fluo and initialize it and Accumulo with data:
./bin/init.sh
Finally, run the following command to run the web app:
./bin/webapp.sh
Open your browser to http://localhost:8080/