Merge remote-tracking branch 'mike/fluo-707'
tree: e000a255b3cdb8fd115ef688f979b6c08521f6d4
  1. contrib/
  2. docs/
  3. modules/
  4. .gitignore
  5. .travis.yml
  6. DISCLAIMER
  7. LICENSE
  8. NOTICE
  9. pom.xml
  10. README.md
README.md

Fluo

Build Status Apache License Maven Central Javadoc

Apache Fluo lets users make incremental updates to large data sets stored in Apache Accumulo.

Apache Fluo is an open source implementation of Percolator (which populates Google's search index) for Apache Accumulo. Fluo makes it possible to update the results of a large-scale computation, index, or analytic as new data is discovered. Check out the Fluo project website for news and general information.

Getting Started

There are several ways to run Fluo (listed in order of increasing difficulty):

  • quickstart - Starts a MiniFluo instance that is configured to run a word count application
  • fluo-dev - Command-line tool for running Fluo and its dependencies on a single machine
  • Zetten - Command-line tool that launches an AWS cluster and sets up Fluo/Accumulo on it
  • Production - Sets up Fluo on a cluster where Accumulo, Hadoop & Zookeeper are running

Except for quickstart, all above will set up a Fluo application that will be idle unless you create client & observer code for your application. You can either create your own application or configure Fluo to run an example application below:

  • phrasecount - Computes phrase counts for unique documents
  • fluo-stress - Computes the number of unique integers by building bitwise trie
  • webindex - Creates a web index using Common Crawl data

Applications

Below are helpful resources for Fluo application developers:

  • Instructions for creating Fluo applications
  • Fluo API javadocs
  • Fluo Recipes is a project that provides common code for Fluo application developers implemented using the Fluo API.

Implementation

  • Architecture - Overview of Fluo's architecture
  • Contributing - Documentation for developers who want to contribute to Fluo
  • Metrics - Fluo metrics are visible via JMX by default but can be configured to send to Graphite or Ganglia