The Webindex example has three major code components.
Since all of these components either read or write the Query table, you may want to read about the Query Table before reading about the code.
The following image shows a high level view of how data flows through the Fluo Webindex code.
This loader queues updated page content for processing by the page observer.
Code: PageLoader.java
This observer computes changes to links within a page by comparing new and current pages. It computes links added and deleted and then pushes this information to the URI Map Observer and Page Exporter.
Conceptually when a page references a new URI, a +1
is queued up for the Uri Map. When a page no longer references a URI, a -1
is queued up for the Uri Map to process.
Code: PageObserver.java
This observer computes per URI reference counts. The code for this this observer is very simple because it builds on the Collision Free Map Recipe. A Collision Free Map has two extension points and this example implements both. The first extension point is a combiner that processes the +1
and -1
updates queued up by the Page Observer. The second extension point is an update observer that handles changes in reference counts for a URI. It pushes these changes in reference counts to the Domain Map and URI Exporter.
Changes to URI reference counts are aggregated per domain and +1
and -1
updates are queued for the domain map.
Code: UriMap.java
This observer computers per domain reference counts. This is a Collision Free Map that tracks per domain information. When its notified that domain counts changed, it pushes updates to the export queue to update the Query table.
Code: DomainMap.java
For each URI, the Query table contains the URIs that reference it. This export code keeps that information in the Query table up to date. One interesting concept this code uses is the concept of inversion on export. The complete inverted URI index is never built in Fluo, its only built in Query table.
Code: PageExport.java
Previous observers calculated the total number of URIs that reference a URI. This export code is given the new and old URI reference counts. URI reference counts are indexed three different ways in the Query table. This export code updates all three places in the Query table.
This export code also uses the invert on export concept. The three indexes are never built in the Fluo table. Fluo only tracks the minimal amount of information needed to keep the three indexes current.
Code: UriCountExport.java
Export changes to the number of URIs referencing a domain to the Query table.
Code: DomainExport.java