Code Guide

The Webindex example has three major code components.

  • Spark component : Generates initial Fluo and Query tables.
  • Fluo component : Updates the Query table as web pages are added, removed, and updated.
  • Web component : Web application that uses the Query table.

Since all of these components either read or write the Query table, you may want to read about the Query Table before reading about the code.

Guide to Fluo Component.

The following image shows a high level view of how data flows through the Fluo Webindex code.

Page Loader

The PageLoader queues updated page content for processing by the PageObserver.

Observer Provider

All Observers are setup by WebindexObservers. This class wires up everything discussed belowr.

Page Observer

The PageObserver computes changes to links in a page. It queues +1 and -1 for new and deleted URIs to the uriQ. It also queues up changes in URIs to the export queue.

URI Combine Queue

A CombineQueue is setup to track the number of pages linking to a URI. The reduce() function in UriInfo combines multiple updates into a single value. UriCombineQ.UriUpdateObserver is called when a keys values changes. The update observer queues ‘+1’ and ‘-1’ to the domain map. The update observer also queues change in URI inbound link counts to the export queue.

Domain Combine Queue

A CombineQueue is setup to track the number of unique URIs observed in each domain. The SummingCombiner from Fluo Recipes combines updates. DomainCombineQ.DomainUpdateObserver is called when a keys value changes and it queues the changes on the export queue.

Export Queue

All other observers place IndexUpdate observers on the export queue. IndexUpdateTranslator is a function that translates IndexUpdates to Accumulo Mutations. This function is passed to the Fluo Recipe that exports to Accumulo tables.

IndexUpdate is is implemented by the following classes:

  1. DomainUpdate - Updates information related to domain (like page count).

  2. PageUpdate - Updates information related to page (like links being added or deleted).

  3. UriUpdate - Updates information related to URI.

These objects are translated to mutations using code in the IndexClient.