Merge pull request #84 from mikewalch/spark-web

Refactored UI to replace Dropwizard with Spark Web
diff --git a/.gitignore b/.gitignore
index b2ae586..54549dd 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,4 +6,4 @@
 .settings
 target/
 /logs/
-/paths/
+/data/
diff --git a/README.md b/README.md
index ef102b5..5de5686 100644
--- a/README.md
+++ b/README.md
@@ -2,141 +2,39 @@
 ---
 [![Build Status][ti]][tl] [![Apache License][li]][ll]
 
-Webindex is an example [Apache Fluo][fluo] application that uses [Common Crawl][cc] web crawl
-data to index links to web pages in multiple ways.  It has a simple UI to view the resulting
-indexes.  If you are new to Fluo, you may want start with the [quickstart][qs] or
-[phrasecount][pc] applications as the webindex application is more complicated.  For more
-information on how the webindex application works, view the [tables](docs/tables.md) and
-[code](docs/code-guide.md) documentation.
+WebIndex is an example [Apache Fluo][fluo] application that uses [Common Crawl][cc] web crawl data
+to index links to web pages in multiple ways. It has a simple UI to view the resulting indexes. If
+you are new to Fluo, you may want start with the[phrasecount][pc] application as the WebIndex
+application is more complicated. For more information on how the WebIndex application works, view
+the [tables](docs/tables.md) and [code](docs/code-guide.md) documentation.
 
-### Requirements
+## Running WebIndex
 
-In order run this application you need the following installed and running on your
-machine:
+If you are new to WebIndex, the simplest way to run the application is to run the development
+server. First, clone the WebIndex repo:
 
-* Hadoop (HDFS & YARN)
-* Accumulo
-* Fluo
+    git clone https://github.com/astralway/webindex.git
 
-Consider using [Uno] to run these requirements
+Next, on a machine where Java and Maven are installed, run the development server using the 
+`webindex` command:
 
-### Configure your environment
+    cd webindex/
+    ./bin/webindex dev
 
-First, you must create the configuration file `data.yml` in the `conf/` directory and edit it
-for your environment.
+This will build and start the development server which will log to the console. When you want to
+terminate the server, press `ctrl-c`.
 
-    cp conf/data.yml.example conf/data.yml
+The development server starts a MiniAccumuloCluster and runs MiniFluo on top of it.  It parses a
+CommonCrawl data file and creates a file at `data/1K-pages.txt` with 1000 pages that are loaded into
+MiniFluo. The pages are processed by Fluo which exports indexes to Accumulo. A web application is
+started at [http://localhost:4567](http://localhost:4567) that queries these indexes.
 
-There are a few environment variables that need to be set to run these scripts (see 
-`conf/webindex-env.sh.example` for a list).  If you don't want to set them in your `~/.bashrc`, 
-create `webindex-env.sh` in `conf/` and set them.
-
-    cp conf/webindex-env.sh.example conf/webindex-env.sh
-
-### Download the paths file for a crawl
-
-For each crawl of the web, Common Crawl produces a file containing a list of paths to the data 
-files produced by that crawl.  The webindex `copy` and `load-s3` commands use this file to 
-retrieve Common Crawl data stored in S3. The `getpaths` command below downloads this paths 
-file for the April 2015 crawl (identified by `2015-18`) to the `paths/` directory as it will 
-be necessary for future commands.  If you would like to use a different crawl, the 
-[Common Crawl website][cdata] has a list of possible crawls which are identified by the 
-`YEAR-WEEK` (i.e. `2015-18`) of the time the crawl occurred.
-
-    ./bin/webindex getpaths 2015-18
-
-Take a look at the paths file that was just retrieved.
-
-    $ less paths/2015-18.wat.paths 
-
-Each line in the paths file contains a path to a different common crawl data file.  In later 
-commands, you will select paths by specifying a range (in the format of `START-END`).  Ranges
-can start at index 0 and their start/end points are inclusive.  Therefore, a range of `4-6` 
-would select 3 paths from line 4, 5, and 6 of the file. Using the command below, you can 
-find the max endpoint for ranges in a paths file.
-
-    $ wc -l paths/2015-18.wat.paths 
-    38609 paths/2015-18.wat.paths
-
-The 2015-18 paths file has 38609 different paths.  A range of `0-38608` would select all 
-paths in the file.
-
-### Copy Common Crawl data from AWS into HDFS
-
-After retrieving a paths file, the command below runs a Spark job that copies data files from S3 
-to HDFS.  The command below will copy 3 files in the file range of `4-6` of the `2015-18` paths 
-file into the HDFS directory `/cc/data/a`.  Common Crawl data files are large (~330 MB each) so
-be mindful of how many you copy.
-
-    ./bin/webindex copy 2015-18 4-6 /cc/data/a
-
-To create multiple data sets, run the command with different range and HDFS directory.
-
-    ./bin/webindex copy 2015-18 7-8 /cc/data/b
-
-### Initialize and start the webindex Fluo application
-
-After copying data into HDFS, run the following to initialize and start the webindex
-Fluo application.
-
-    ./bin/webindex init
-
-Optionally, add a HDFS directory (with previously copied data) to the end of the command.  
-When a directory is specified, `init` will run a Spark job that initializes the webindex
-Fluo application with data before starting it.
-    
-    ./bin/webindex init /cc/data/a
-
-### Load data into the webindex Fluo application
-
-The `init` command should only be run on an empty cluster.  To add more data, run the 
-`load-hdfs` or `load-s3` commands.  Both start a Spark job that parses Common Crawl data 
-and inserts this data into the Fluo table of the webindex application.  The webindex Fluo 
-observers will incrementally process this data and export indexes to Accumulo.
-
-The `load-hdfs` command below loads data stored in the HDFS directory `/cc/data/b` into 
-Fluo.
-
-    ./bin/webindex load-hdfs /cc/data/b
-
-The `load-s3` command below loads data hosted on S3 into Fluo.  It select files in the 
-`9-10` range of the `2015-18` paths file.
-
-    ./bin/webindex load-s3 2015-18 9-10
-
-### Compact Transient Ranges
-
-For long runs, this example has [transient ranges][transient] that need to be 
-periodically compacted.  This can be accomplished with the following command.
-
-```bash
-nohup fluo exec webindex org.apache.fluo.recipes.accumulo.cmds.CompactTransient 600 &> your_log_file.log &
-```
-
-As long as this command is running, it will initiate a compaction of all transient 
-ranges every 10 minutes.
-
-### Run the webindex UI
-
-Run the following command to run the webindex UI which can be viewed at 
-[http://localhost:8080/](http://localhost:8080/).
-
-    ./bin/webindex ui
-
-The UI queries indexes stored in Accumulo that were exported by Fluo.  The UI is 
-implemented using [dropwizard].  Optionally, you can modify the default dropwizard 
-configuration by creating a `dropwizard.yml` in `conf/`.
-    
-    cp conf/dropwizard.yml.example conf/dropwizard.yml
+If you would like to run WebIndex on a cluster, follow the [install] instructions. 
 
 [fluo]: https://fluo.apache.org/
-[qs]: https://github.com/astralway/quickstart
 [pc]: https://github.com/astralway/phrasecount
-[Uno]: https://github.com/astralway/uno
-[dropwizard]: http://dropwizard.io/
 [cc]: https://commoncrawl.org/
-[cdata]: https://commoncrawl.org/the-data/get-started/
-[transient]: https://github.com/apache/fluo-recipes/blob/master/docs/transient.md
+[install]: docs/install.md
 [ti]: https://travis-ci.org/astralway/webindex.svg?branch=master
 [tl]: https://travis-ci.org/astralway/webindex
 [li]: http://img.shields.io/badge/license-ASL-blue.svg
diff --git a/bin/impl/base.sh b/bin/impl/base.sh
index fd5885e..5117ca5 100755
--- a/bin/impl/base.sh
+++ b/bin/impl/base.sh
@@ -15,13 +15,28 @@
 # limitations under the License.
 
 : ${WI_HOME?"WI_HOME must be set"}
-: ${DATA_CONFIG?"DATA_CONFIG must be set"}
+: ${WI_CONFIG?"WI_CONFIG must be set"}
 : ${SPARK_HOME?"SPARK_HOME must be set"}
 
 function get_prop {
-  echo "`grep $1 $DATA_CONFIG | cut -d ' ' -f 2`"
+  echo "`grep $1 $WI_CONFIG | cut -d ' ' -f 2`"
 }
 
+: ${HADOOP_CONF_DIR?"HADOOP_CONF_DIR must be set in bash env or conf/webindex-env.sh"}
+if [ ! -d $HADOOP_CONF_DIR ]; then
+  echo "HADOOP_CONF_DIR=$HADOOP_CONF_DIR does not exist"
+  exit 1
+fi
+: ${FLUO_HOME?"FLUO_HOME must be set in bash env or conf/webindex-env.sh"}
+if [ ! -d $FLUO_HOME ]; then
+  echo "FLUO_HOME=$FLUO_HOME does not exist"
+  exit 1
+fi
+
+: ${WI_EXECUTOR_INSTANCES?"WI_EXECUTOR_INSTANCES must be set in bash env or conf/webindex-env.sh"}
+: ${WI_EXECUTOR_MEMORY?"WI_EXECUTOR_MEMORY must be set in bash env or conf/webindex-env.sh"}
+export COMMON_SPARK_OPTS="--master yarn-client --num-executors $WI_EXECUTOR_INSTANCES --executor-memory $WI_EXECUTOR_MEMORY"
+
 export SPARK_SUBMIT=$SPARK_HOME/bin/spark-submit
 if [ ! -f $SPARK_SUBMIT ]; then
   echo "The spark-submit command cannot be found in SPARK_HOME=$SPARK_HOME.  Please set SPARK_HOME in conf/webindex-env.sh"
@@ -33,13 +48,13 @@
 # Stop if any command after this fails
 set -e
 
-export WI_DATA_JAR=$WI_HOME/modules/data/target/webindex-data-0.0.1-SNAPSHOT.jar
-export WI_DATA_DEP_JAR=$WI_HOME/modules/data/target/webindex-data-0.0.1-SNAPSHOT-shaded.jar
+export WI_DATA_JAR=$WI_HOME/modules/data/target/webindex-data-$WI_VERSION.jar
+export WI_DATA_DEP_JAR=$WI_HOME/modules/data/target/webindex-data-$WI_VERSION-shaded.jar
 if [ ! -f $WI_DATA_DEP_JAR ]; then
   echo "Building $WI_DATA_DEP_JAR"
   cd $WI_HOME
 
   : ${SPARK_VERSION?"SPARK_VERSION must be set in bash env or conf/webindex-env.sh"}
   : ${HADOOP_VERSION?"HADOOP_VERSION must be set in bash env or conf/webindex-env.sh"}
-  mvn clean package -DskipTests -Dspark.version=$SPARK_VERSION -Dhadoop.version=$HADOOP_VERSION
+  mvn clean package -Pcreate-shade-jar -DskipTests -Dspark.version=$SPARK_VERSION -Dhadoop.version=$HADOOP_VERSION
 fi
diff --git a/bin/impl/init.sh b/bin/impl/init.sh
index d1c463c..e9a854b 100755
--- a/bin/impl/init.sh
+++ b/bin/impl/init.sh
@@ -54,10 +54,10 @@
 mvn dependency:get -Dartifact=com.esotericsoftware:reflectasm:1.10.1:jar -Ddest=$FLUO_APP_LIB
 mvn dependency:get -Dartifact=org.objenesis:objenesis:2.1:jar -Ddest=$FLUO_APP_LIB
 # Add webindex core and its dependencies
-cp $WI_HOME/modules/core/target/webindex-core-0.0.1-SNAPSHOT.jar $FLUO_APP_LIB
+cp $WI_HOME/modules/core/target/webindex-core-$WI_VERSION.jar $FLUO_APP_LIB
 mvn dependency:get -Dartifact=commons-validator:commons-validator:1.4.1:jar -Ddest=$FLUO_APP_LIB
 
-java -cp $WI_DATA_DEP_JAR webindex.data.Configure $DATA_CONFIG
+java -cp $WI_DATA_DEP_JAR webindex.data.Configure $WI_CONFIG
 
 $FLUO_CMD init $FLUO_APP --force
 
diff --git a/bin/webindex b/bin/webindex
index 5bc6748..5c298c6 100755
--- a/bin/webindex
+++ b/bin/webindex
@@ -16,6 +16,7 @@
 
 BIN_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
 export WI_HOME=$( cd "$( dirname "$BIN_DIR" )" && pwd )
+export WI_VERSION=0.0.1-SNAPSHOT
 
 if [ -f $WI_HOME/conf/webindex-env.sh ]; then
   . $WI_HOME/conf/webindex-env.sh
@@ -23,57 +24,57 @@
   . $WI_HOME/conf/webindex-env.sh.example
 fi
 
-: ${HADOOP_CONF_DIR?"HADOOP_CONF_DIR must be set in bash env or conf/webindex-env.sh"}
-if [ ! -d $HADOOP_CONF_DIR ]; then
-  echo "HADOOP_CONF_DIR=$HADOOP_CONF_DIR does not exist"
-  exit 1
-fi
-: ${FLUO_HOME?"FLUO_HOME must be set in bash env or conf/webindex-env.sh"}
-if [ ! -d $FLUO_HOME ]; then
-  echo "FLUO_HOME=$FLUO_HOME does not exist"
-  exit 1
-fi
-
 mkdir -p $WI_HOME/logs
 
-export DATA_CONFIG=$WI_HOME/conf/data.yml
-if [ ! -f $DATA_CONFIG ]; then
-  export DATA_CONFIG=$WI_HOME/conf/data.yml.example
-  if [ ! -f $DATA_CONFIG ]; then
-    echo "Could not find data.yml or data.yml.example in $WI_HOME/conf"
+export WI_CONFIG=$WI_HOME/conf/webindex.yml
+if [ ! -f $WI_CONFIG ]; then
+  export WI_CONFIG=$WI_HOME/conf/webindex.yml.example
+  if [ ! -f $WI_CONFIG ]; then
+    echo "Could not find webindex.yml or webindex.yml.example in $WI_HOME/conf"
     exit 1
   fi
-  echo "Using default config at $DATA_CONFIG"
+fi
+
+log4j_config=$WI_HOME/conf/log4j.properties
+if [ ! -f $log4j_config ]; then
+  log4j_config=$WI_HOME/conf/log4j.properties.example
+  if [ ! -f $log4j_config ]; then
+    echo "Could not find logj4.properties or log4j.properties.example in $WI_HOME/conf"
+    exit 1
+  fi
 fi
 
 function get_prop {
-  echo "`grep $1 $DATA_CONFIG | cut -d ' ' -f 2`"
+  echo "`grep $1 $WI_CONFIG | cut -d ' ' -f 2`"
 }
 
-: ${WI_EXECUTOR_INSTANCES?"WI_EXECUTOR_INSTANCES must be set in bash env or conf/webindex-env.sh"}
-: ${WI_EXECUTOR_MEMORY?"WI_EXECUTOR_MEMORY must be set in bash env or conf/webindex-env.sh"}
-export COMMON_SPARK_OPTS="--master yarn-client --num-executors $WI_EXECUTOR_INSTANCES --executor-memory $WI_EXECUTOR_MEMORY"
 COMMAND_LOGFILE=$WI_HOME/logs/$1_`date +%s`.log
+DATA_DIR=$WI_HOME/data
+mkdir -p $DATA_DIR
 
 case "$1" in
+dev)
+  pkill -9 -f webindex-dev-server
+  cd $WI_HOME
+  mvn -q compile -P webindex-dev-server -Dlog4j.configuration=file:$log4j_config
+  ;;
 getpaths)
-  PATHS_DIR=$WI_HOME/paths
-  mkdir -p $PATHS_DIR
+  mkdir -p $DATA_DIR
   PATHS_FILE="$2".wat.paths
-  if [ ! -f $PATHS_DIR/$PATHS_FILE ]; then
-    rm -f $PATHS_DIR/wat.paths.gz
+  if [ ! -f $DATA_DIR/$PATHS_FILE ]; then
+    rm -f $DATA_DIR/wat.paths.gz
     PATHS_URL=https://aws-publicdatasets.s3.amazonaws.com/common-crawl/crawl-data/CC-MAIN-$2/wat.paths.gz
     if [[ `wget -S --spider $PATHS_URL 2>&1 | grep 'HTTP/1.1 200 OK'` ]]; then
-      wget -P $PATHS_DIR $PATHS_URL
-      gzip -d $PATHS_DIR/wat.paths.gz
-      mv $PATHS_DIR/wat.paths $PATHS_DIR/$PATHS_FILE
-      echo "Downloaded paths file to $PATHS_DIR/$PATHS_FILE"
+      wget -P $DATA_DIR $PATHS_URL
+      gzip -d $DATA_DIR/wat.paths.gz
+      mv $DATA_DIR/wat.paths $DATA_DIR/$PATHS_FILE
+      echo "Downloaded paths file to $DATA_DIR/$PATHS_FILE"
     else
       echo "Crawl paths file for date $2 does not exist at $PATHS_URL"
       exit 1
     fi
   else
-    echo "Crawl paths file already exists at $PATHS_DIR/$PATHS_FILE"
+    echo "Crawl paths file already exists at $DATA_DIR/$PATHS_FILE"
   fi
   ;;
 copy)
@@ -83,7 +84,7 @@
   fi
   . $BIN_DIR/impl/base.sh
   COMMAND="$SPARK_SUBMIT --class webindex.data.Copy $COMMON_SPARK_OPTS \
-    $WI_DATA_DEP_JAR $WI_HOME/paths/"$2".wat.paths $3 $4"
+    $WI_DATA_DEP_JAR $DATA_DIR/"$2".wat.paths $3 $4"
   if [ "$5" != "-fg" ]; then
     nohup ${COMMAND} &> $COMMAND_LOGFILE &
     echo "Started copy.  Logs are being output to $COMMAND_LOGFILE"
@@ -140,7 +141,7 @@
     exit 1
   fi
   COMMAND="$SPARK_SUBMIT --class webindex.data.LoadS3 $COMMON_SPARK_OPTS \
-    --files $FLUO_PROPS $WI_DATA_DEP_JAR $WI_HOME/paths/"$2".wat.paths $3"
+    --files $FLUO_PROPS $WI_DATA_DEP_JAR $DATA_DIR/"$2".wat.paths $3"
   if [ "$4" != "-fg" ]; then
     nohup ${COMMAND} &> $COMMAND_LOGFILE &
     echo "Started load-s3.  Logs are being output to $COMMAND_LOGFILE"
@@ -155,7 +156,7 @@
   fi
   . $BIN_DIR/impl/base.sh
   COMMAND="$SPARK_SUBMIT --class webindex.data.TestParser $COMMON_SPARK_OPTS \
-    $WI_DATA_DEP_JAR $WI_HOME/paths/"$2".wat.paths $3"
+    $WI_DATA_DEP_JAR $DATA_DIR/"$2".wat.paths $3"
   if [ "$4" != "-fg" ]; then
     nohup ${COMMAND} &> $COMMAND_LOGFILE &
     echo "Started data-verify.  Logs are being output to $COMMAND_LOGFILE"
@@ -164,18 +165,9 @@
   fi
   ;;
 ui)
-  pkill -9 -f webindex-ui
-  WI_UI_JAR=$WI_HOME/modules/ui/target/webindex-ui-0.0.1-SNAPSHOT.jar
-  if [ ! -f $WI_UI_JAR ]; then
-    cd $WI_HOME/modules/ui
-    mvn clean install -DskipTests
-  fi
-  DROPWIZARD_CONFIG=""
-  if [ -f $WI_HOME/conf/dropwizard.yml ]; then
-    DROPWIZARD_CONFIG=$WI_HOME/conf/dropwizard.yml
-    echo "Running with dropwizard config at $DROPWIZARD_CONFIG"
-  fi
-  COMMAND="java -jar $WI_UI_JAR server $DROPWIZARD_CONFIG"
+  pkill -9 -f webindex-web-server
+  cd $WI_HOME
+  COMMAND="mvn -q compile -P webindex-web-server -Dlog4j.configuration=file:$log4j_config"
   if [ "$2" != "-fg" ]; then
     nohup ${COMMAND} &> $COMMAND_LOGFILE &
     echo "Started UI.  Logs are being output to $COMMAND_LOGFILE"
@@ -242,7 +234,7 @@
     exit 1
   fi
   echo "Killing the webindex UI web server..."
-  pkill -9 -f webindex-ui
+  pkill -9 -f webindex-web-server
 
   echo "Stopping the $FLUO_APP Fluo application (if running)..."
   $FLUO_CMD stop $FLUO_APP
@@ -254,7 +246,8 @@
 *)
   echo -e "Usage: webindex <command> (<argument>)\n"
   echo -e "Possible commands:\n"
-  echo "  getpaths <DATE>             Retrieves paths file for given crawl <DATE> (i.e 2015-18) and stores file in the 'paths/' directory"
+  echo "  dev                         Runs WebIndex development server"
+  echo "  getpaths <DATE>             Retrieves paths file for given crawl <DATE> (i.e 2015-18) and stores file in the 'data/' directory"
   echo "                              See https://commoncrawl.org/the-data/get-started/ for possible crawl dates"
   echo "  copy <DATE> <RANGE> <DEST>  Copies CommonCrawl data files from S3 given a <DATE> and <RANGE> (i.e 0-8) into HDFS <DEST> directory"
   echo "  init [<SRC>]                Initializes and starts the WebIndex application. Optionally, a <SRC> HDFS directory can be added to"
diff --git a/conf/.gitignore b/conf/.gitignore
index 45f109f..40c8d7c 100644
--- a/conf/.gitignore
+++ b/conf/.gitignore
@@ -1,3 +1,3 @@
-data.yml
-dropwizard.yml
+webindex.yml
 webindex-env.sh
+log4j.properties
diff --git a/conf/dropwizard.yml.example b/conf/dropwizard.yml.example
deleted file mode 100644
index 77f06a8..0000000
--- a/conf/dropwizard.yml.example
+++ /dev/null
@@ -1,21 +0,0 @@
-# Copyright 2015 Webindex authors (see AUTHORS)
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# This optional file configures the dropwizard settings in the UI.
-# See the URL below for possible configuration:
-# http://www.dropwizard.io/0.8.2/docs/manual/configuration.html
-
-# This file is optional.  If you don't need to change the default dropwizard
-# configuration, don't create this file as dropwizard will complain if it is
-# created but empty.
diff --git a/conf/log4j.properties.example b/conf/log4j.properties.example
new file mode 100644
index 0000000..694c884
--- /dev/null
+++ b/conf/log4j.properties.example
@@ -0,0 +1,29 @@
+# Copyright 2016 Webindex authors (see AUTHORS)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+log4j.rootLogger=INFO, CA
+log4j.appender.CA=org.apache.log4j.ConsoleAppender
+log4j.appender.CA.layout=org.apache.log4j.PatternLayout
+log4j.appender.CA.layout.ConversionPattern=%d{ISO8601} [%c] %-5p: %m%n
+
+log4j.logger.org.apache.accumulo=WARN
+log4j.logger.org.apache.curator=ERROR
+log4j.logger.org.apache.fluo=WARN
+log4j.logger.org.apache.hadoop=WARN
+log4j.logger.org.apache.hadoop.mapreduce=ERROR
+log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
+log4j.logger.org.apache.zookeeper=ERROR
+log4j.logger.org.eclipse.jetty=WARN
+log4j.logger.org.spark-project=WARN
+log4j.logger.webindex=INFO
diff --git a/conf/data.yml.example b/conf/webindex.yml.example
similarity index 100%
rename from conf/data.yml.example
rename to conf/webindex.yml.example
diff --git a/docs/install.md b/docs/install.md
new file mode 100644
index 0000000..6f4fab5
--- /dev/null
+++ b/docs/install.md
@@ -0,0 +1,131 @@
+# WebIndex Install
+
+Below are instructions for installing WebIndex on a cluster.
+
+## Requirements
+
+To run WebIndex, you need the following installed on your cluster:
+
+* Java
+* Hadoop (HDFS & YARN)
+* Accumulo
+* Fluo
+* Maven
+
+Hadoop & Accumulo should be running before starting these instructions.  Fluo and Maven only need to
+be installed on the machine where you run the `webindex` command. Consider using [Uno] to setup 
+Hadoop, Accumulo & Fluo if you are running on a single node.
+
+## Configure your environment
+
+First, clone the WebIndex repo:
+
+    git clone https://github.com/astralway/webindex.git
+
+Next, create the configuration file `webindex.yml` in the `conf/` directory and edit it
+for your environment.
+
+    cd webindex/
+    cp conf/webindex.yml.example conf/webindex.yml
+
+There are a few environment variables that need to be set to run these scripts (see 
+`conf/webindex-env.sh.example` for a list).  If you don't want to set them in your `~/.bashrc`, 
+create `webindex-env.sh` in `conf/` and set them.
+
+    cp conf/webindex-env.sh.example conf/webindex-env.sh
+
+## Download the paths file for a crawl
+
+For each crawl of the web, Common Crawl produces a file containing a list of paths to the data 
+files produced by that crawl.  The webindex `copy` and `load-s3` commands use this file to 
+retrieve Common Crawl data stored in S3. The `getpaths` command below downloads this paths 
+file for the April 2015 crawl (identified by `2015-18`) to the `paths/` directory as it will 
+be necessary for future commands.  If you would like to use a different crawl, the 
+[Common Crawl website][cdata] has a list of possible crawls which are identified by the 
+`YEAR-WEEK` (i.e. `2015-18`) of the time the crawl occurred.
+
+    ./bin/webindex getpaths 2015-18
+
+Take a look at the paths file that was just retrieved.
+
+    $ less paths/2015-18.wat.paths 
+
+Each line in the paths file contains a path to a different common crawl data file.  In later 
+commands, you will select paths by specifying a range (in the format of `START-END`).  Ranges
+can start at index 0 and their start/end points are inclusive.  Therefore, a range of `4-6` 
+would select 3 paths from line 4, 5, and 6 of the file. Using the command below, you can 
+find the max endpoint for ranges in a paths file.
+
+    $ wc -l paths/2015-18.wat.paths 
+    38609 paths/2015-18.wat.paths
+
+The 2015-18 paths file has 38609 different paths.  A range of `0-38608` would select all 
+paths in the file.
+
+## Copy Common Crawl data from AWS into HDFS
+
+After retrieving a paths file, the command below runs a Spark job that copies data files from S3 
+to HDFS.  The command below will copy 3 files in the file range of `4-6` of the `2015-18` paths 
+file into the HDFS directory `/cc/data/a`.  Common Crawl data files are large (~330 MB each) so
+be mindful of how many you copy.
+
+    ./bin/webindex copy 2015-18 4-6 /cc/data/a
+
+To create multiple data sets, run the command with different range and HDFS directory.
+
+    ./bin/webindex copy 2015-18 7-8 /cc/data/b
+
+## Initialize and start the webindex Fluo application
+
+After copying data into HDFS, run the following to initialize and start the webindex
+Fluo application.
+
+    ./bin/webindex init
+
+Optionally, add a HDFS directory (with previously copied data) to the end of the command.  
+When a directory is specified, `init` will run a Spark job that initializes the webindex
+Fluo application with data before starting it.
+    
+    ./bin/webindex init /cc/data/a
+
+## Load data into the webindex Fluo application
+
+The `init` command should only be run on an empty cluster.  To add more data, run the 
+`load-hdfs` or `load-s3` commands.  Both start a Spark job that parses Common Crawl data 
+and inserts this data into the Fluo table of the webindex application.  The webindex Fluo 
+observers will incrementally process this data and export indexes to Accumulo.
+
+The `load-hdfs` command below loads data stored in the HDFS directory `/cc/data/b` into 
+Fluo.
+
+    ./bin/webindex load-hdfs /cc/data/b
+
+The `load-s3` command below loads data hosted on S3 into Fluo.  It select files in the 
+`9-10` range of the `2015-18` paths file.
+
+    ./bin/webindex load-s3 2015-18 9-10
+
+## Compact Transient Ranges
+
+For long runs, this example has [transient ranges][transient] that need to be 
+periodically compacted.  This can be accomplished with the following command.
+
+```bash
+nohup fluo exec webindex org.apache.fluo.recipes.accumulo.cmds.CompactTransient 600 &> your_log_file.log &
+```
+
+As long as this command is running, it will initiate a compaction of all transient 
+ranges every 10 minutes.
+
+## Run the webindex UI
+
+Run the following command to run the webindex UI which can be viewed at 
+[http://localhost:4567/](http://localhost:4567/).
+
+    ./bin/webindex ui
+
+The UI queries indexes stored in Accumulo that were exported by Fluo.
+
+[Uno]: https://github.com/astralway/uno
+[transient]: https://github.com/apache/fluo-recipes/blob/master/docs/transient.md
+[cdata]: https://commoncrawl.org/the-data/get-started/
diff --git a/modules/core/pom.xml b/modules/core/pom.xml
index 86221cc..f402de6 100644
--- a/modules/core/pom.xml
+++ b/modules/core/pom.xml
@@ -34,6 +34,10 @@
       <artifactId>gson</artifactId>
     </dependency>
     <dependency>
+      <groupId>com.google.guava</groupId>
+      <artifactId>guava</artifactId>
+    </dependency>
+    <dependency>
       <groupId>commons-lang</groupId>
       <artifactId>commons-lang</artifactId>
     </dependency>
@@ -43,6 +47,10 @@
       <version>1.4.1</version>
     </dependency>
     <dependency>
+      <groupId>org.apache.accumulo</groupId>
+      <artifactId>accumulo-core</artifactId>
+    </dependency>
+    <dependency>
       <groupId>org.slf4j</groupId>
       <artifactId>slf4j-api</artifactId>
     </dependency>
diff --git a/modules/core/src/main/java/webindex/core/IndexClient.java b/modules/core/src/main/java/webindex/core/IndexClient.java
new file mode 100644
index 0000000..e28d7f8
--- /dev/null
+++ b/modules/core/src/main/java/webindex/core/IndexClient.java
@@ -0,0 +1,234 @@
+/*
+ * Copyright 2015 Webindex authors (see AUTHORS)
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
+ * in compliance with the License. You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+ * or implied. See the License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package webindex.core;
+
+import java.util.Iterator;
+import java.util.Map;
+
+import com.google.gson.Gson;
+import org.apache.accumulo.core.client.Connector;
+import org.apache.accumulo.core.client.Scanner;
+import org.apache.accumulo.core.client.TableNotFoundException;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.apache.accumulo.core.data.Value;
+import org.apache.accumulo.core.security.Authorizations;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import webindex.core.models.DomainStats;
+import webindex.core.models.Link;
+import webindex.core.models.Links;
+import webindex.core.models.Page;
+import webindex.core.models.Pages;
+import webindex.core.models.TopResults;
+import webindex.core.models.URL;
+import webindex.core.util.Pager;
+
+public class IndexClient {
+
+  private static final Logger log = LoggerFactory.getLogger(IndexClient.class);
+  private static final int PAGE_SIZE = 25;
+
+  private Connector conn;
+  private String accumuloIndexTable;
+  private Gson gson = new Gson();
+
+  public IndexClient(String accumuloIndexTable, Connector conn) {
+    this.accumuloIndexTable = accumuloIndexTable;
+    this.conn = conn;
+  }
+
+  public TopResults getTopResults(String next, int pageNum) {
+
+    TopResults results = new TopResults();
+
+    results.setPageNum(pageNum);
+    try {
+      Scanner scanner = conn.createScanner(accumuloIndexTable, Authorizations.EMPTY);
+      Pager pager = Pager.build(scanner, Range.prefix("t:"), PAGE_SIZE, entry -> {
+        String row = entry.getKey().getRow().toString();
+        if (entry.isNext()) {
+          results.setNext(row);
+        } else {
+          String url = URL.fromPageID(row.split(":", 3)[2]).toString();
+          Long num = Long.parseLong(entry.getValue().toString());
+          results.addResult(url, num);
+        }
+      });
+      if (next.isEmpty()) {
+        pager.read(pageNum);
+      } else {
+        pager.read(new Key(next));
+      }
+    } catch (TableNotFoundException e) {
+      log.error("Table {} not found", accumuloIndexTable);
+    }
+    return results;
+  }
+
+  private static Long getLongValue(Map.Entry<Key, Value> entry) {
+    return Long.parseLong(entry.getValue().toString());
+  }
+
+  public Page getPage(String rawUrl) {
+    Page page = null;
+    Long incount = (long) 0;
+    URL url;
+    try {
+      url = URL.from(rawUrl);
+    } catch (Exception e) {
+      log.error("Failed to parse URL {}", rawUrl);
+      return null;
+    }
+
+    try {
+      Scanner scanner = conn.createScanner(accumuloIndexTable, Authorizations.EMPTY);
+      scanner.setRange(Range.exact("p:" + url.toPageID(), Constants.PAGE));
+      for (Map.Entry<Key, Value> entry : scanner) {
+        switch (entry.getKey().getColumnQualifier().toString()) {
+          case Constants.INCOUNT:
+            incount = getLongValue(entry);
+            break;
+          case Constants.CUR:
+            page = gson.fromJson(entry.getValue().toString(), Page.class);
+            break;
+          default:
+            log.error("Unknown page stat {}", entry.getKey().getColumnQualifier());
+        }
+      }
+    } catch (TableNotFoundException e) {
+      e.printStackTrace();
+    }
+
+    if (page == null) {
+      page = new Page(url.toPageID());
+    }
+    page.setNumInbound(incount);
+    return page;
+  }
+
+  public DomainStats getDomainStats(String domain) {
+    DomainStats stats = new DomainStats(domain);
+    Scanner scanner;
+    try {
+      scanner = conn.createScanner(accumuloIndexTable, Authorizations.EMPTY);
+      scanner.setRange(Range.exact("d:" + URL.reverseHost(domain), Constants.DOMAIN));
+      for (Map.Entry<Key, Value> entry : scanner) {
+        switch (entry.getKey().getColumnQualifier().toString()) {
+          case Constants.PAGECOUNT:
+            stats.setTotal(getLongValue(entry));
+            break;
+          default:
+            log.error("Unknown page domain {}", entry.getKey().getColumnQualifier());
+        }
+      }
+    } catch (TableNotFoundException e) {
+      e.printStackTrace();
+    }
+    return stats;
+  }
+
+  public Pages getPages(String domain, String next, int pageNum) {
+    DomainStats stats = getDomainStats(domain);
+    Pages pages = new Pages(domain, pageNum);
+    pages.setTotal(stats.getTotal());
+    String row = "d:" + URL.reverseHost(domain);
+    String cf = Constants.RANK;
+    try {
+      Scanner scanner = conn.createScanner(accumuloIndexTable, Authorizations.EMPTY);
+      Pager pager =
+          Pager.build(scanner, Range.prefix(row + ":"), PAGE_SIZE, entry -> {
+            if (entry.isNext()) {
+              pages.setNext(entry.getKey().getRowData().toString().split(":", 3)[2]);
+            } else {
+              String url =
+                  URL.fromPageID(entry.getKey().getRowData().toString().split(":", 4)[3])
+                      .toString();
+              Long count = Long.parseLong(entry.getValue().toString());
+              pages.addPage(url, count);
+            }
+          });
+      if (next.isEmpty()) {
+        pager.read(pageNum);
+      } else {
+        pager.read(new Key(row + ":" + next, cf, ""));
+
+      }
+    } catch (TableNotFoundException e) {
+      log.error("Table {} not found", accumuloIndexTable);
+    }
+    return pages;
+  }
+
+  public Links getLinks(String rawUrl, String linkType, String next, int pageNum) {
+
+    Links links = new Links(rawUrl, linkType, pageNum);
+
+    URL url;
+    try {
+      url = URL.from(rawUrl);
+    } catch (Exception e) {
+      log.error("Failed to parse URL: " + rawUrl);
+      return links;
+    }
+
+    try {
+      Scanner scanner = conn.createScanner(accumuloIndexTable, Authorizations.EMPTY);
+      String row = "p:" + url.toPageID();
+      if (linkType.equals("in")) {
+        Page page = getPage(rawUrl);
+        String cf = Constants.INLINKS;
+        links.setTotal(page.getNumInbound());
+        Pager pager = Pager.build(scanner, Range.exact(row, cf), PAGE_SIZE, entry -> {
+          String pageID = entry.getKey().getColumnQualifier().toString();
+          if (entry.isNext()) {
+            links.setNext(pageID);
+          } else {
+            String anchorText = entry.getValue().toString();
+            links.addLink(Link.of(pageID, anchorText));
+          }
+        });
+        if (next.isEmpty()) {
+          pager.read(pageNum);
+        } else {
+          pager.read(new Key(row, cf, next));
+        }
+      } else {
+        scanner.setRange(Range.exact(row, Constants.PAGE, Constants.CUR));
+        Iterator<Map.Entry<Key, Value>> iter = scanner.iterator();
+        if (iter.hasNext()) {
+          Page curPage = gson.fromJson(iter.next().getValue().toString(), Page.class);
+          links.setTotal(curPage.getNumOutbound());
+          int skip = 0;
+          int add = 0;
+          for (Link l : curPage.getOutboundLinks()) {
+            if (skip < (pageNum * PAGE_SIZE)) {
+              skip++;
+            } else if (add < PAGE_SIZE) {
+              links.addLink(l);
+              add++;
+            } else {
+              links.setNext(l.getPageID());
+              break;
+            }
+          }
+        }
+      }
+    } catch (TableNotFoundException e) {
+      log.error("Table {} not found", accumuloIndexTable);
+    }
+    return links;
+  }
+}
diff --git a/modules/core/src/main/java/webindex/core/DataConfig.java b/modules/core/src/main/java/webindex/core/WebIndexConfig.java
similarity index 84%
rename from modules/core/src/main/java/webindex/core/DataConfig.java
rename to modules/core/src/main/java/webindex/core/WebIndexConfig.java
index 36a3bf9..dadc429 100644
--- a/modules/core/src/main/java/webindex/core/DataConfig.java
+++ b/modules/core/src/main/java/webindex/core/WebIndexConfig.java
@@ -21,9 +21,9 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class DataConfig {
+public class WebIndexConfig {
 
-  private static final Logger log = LoggerFactory.getLogger(DataConfig.class);
+  private static final Logger log = LoggerFactory.getLogger(WebIndexConfig.class);
 
   public static String CC_URL_PREFIX = "https://aws-publicdatasets.s3.amazonaws.com/";
   public static final String WI_EXECUTOR_INSTANCES = "WI_EXECUTOR_INSTANCES";
@@ -71,27 +71,27 @@
     return path;
   }
 
-  public static DataConfig load() {
+  public static WebIndexConfig load() {
     final String homePath = getEnvPath("WI_HOME");
-    final String userPath = homePath + "/conf/data.yml";
-    final String defaultPath = homePath + "/conf/data.yml.example";
+    final String userPath = homePath + "/conf/webindex.yml";
+    final String defaultPath = homePath + "/conf/webindex.yml.example";
     if ((new File(userPath).exists())) {
       log.info("Using user config at {}", userPath);
       return load(userPath);
     } else {
-      log.info("Using default config at {}" + defaultPath);
+      log.info("Using default config at {}", defaultPath);
       return load(defaultPath);
     }
   }
 
-  public static DataConfig load(String configPath) {
+  public static WebIndexConfig load(String configPath) {
     return load(configPath, true);
   }
 
-  protected static DataConfig load(String configPath, boolean useEnv) {
+  protected static WebIndexConfig load(String configPath, boolean useEnv) {
     try {
       YamlReader reader = new YamlReader(new FileReader(configPath));
-      DataConfig config = reader.read(DataConfig.class);
+      WebIndexConfig config = reader.read(WebIndexConfig.class);
       if (useEnv) {
         config.hadoopConfDir = getEnvPath("HADOOP_CONF_DIR");
         config.fluoHome = getEnvPath("FLUO_HOME");
diff --git a/modules/core/src/main/java/webindex/core/models/URL.java b/modules/core/src/main/java/webindex/core/models/URL.java
index 04fb3f9..c090083 100644
--- a/modules/core/src/main/java/webindex/core/models/URL.java
+++ b/modules/core/src/main/java/webindex/core/models/URL.java
@@ -18,6 +18,8 @@
 import java.util.Objects;
 import java.util.function.Function;
 
+import com.google.common.net.HostSpecifier;
+import com.google.common.net.InternetDomainName;
 import org.apache.commons.lang.ArrayUtils;
 import org.apache.commons.validator.routines.InetAddressValidator;
 import org.slf4j.Logger;
@@ -63,6 +65,19 @@
     throw new IllegalArgumentException(msg);
   }
 
+  public static String domainFromHost(String host) {
+    return InternetDomainName.from(host).topPrivateDomain().name();
+  }
+
+  public static boolean isValidHost(String host) {
+    return HostSpecifier.isValid(host) && InternetDomainName.isValid(host)
+        && InternetDomainName.from(host).isUnderPublicSuffix();
+  }
+
+  public static URL from(String rawUrl) {
+    return URL.from(rawUrl, URL::domainFromHost, URL::isValidHost);
+  }
+
   public static URL from(String rawUrl, Function<String, String> domainFromHost,
       Function<String, Boolean> isValidHost) {
 
@@ -131,6 +146,10 @@
     return new URL(domain, host, path, port, secure, ipHost);
   }
 
+  public static boolean isValid(String rawUrl) {
+    return URL.isValid(rawUrl, URL::domainFromHost, URL::isValidHost);
+  }
+
   public static boolean isValid(String rawUrl, Function<String, String> domainFromHost,
       Function<String, Boolean> isValidHost) {
     try {
diff --git a/modules/core/src/main/java/webindex/core/util/Pager.java b/modules/core/src/main/java/webindex/core/util/Pager.java
new file mode 100644
index 0000000..27b4cf8
--- /dev/null
+++ b/modules/core/src/main/java/webindex/core/util/Pager.java
@@ -0,0 +1,104 @@
+/*
+ * Copyright 2015 Webindex authors (see AUTHORS)
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
+ * in compliance with the License. You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+ * or implied. See the License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package webindex.core.util;
+
+import java.util.Iterator;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.function.Consumer;
+
+import org.apache.accumulo.core.client.Scanner;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.apache.accumulo.core.data.Value;
+
+public class Pager {
+
+  private Scanner scanner;
+  private int pageSize;
+  private Range pageRange;
+  private Consumer<PageEntry> entryHandler;
+  private AtomicBoolean pageRead = new AtomicBoolean(false);
+
+  public class PageEntry {
+
+    private Key key;
+    private Value value;
+    private boolean isNext;
+
+    public PageEntry(Key key, Value value, boolean isNext) {
+      this.key = key;
+      this.value = value;
+      this.isNext = isNext;
+    }
+
+    public Key getKey() {
+      return key;
+    }
+
+    public Value getValue() {
+      return value;
+    }
+
+    public boolean isNext() {
+      return isNext;
+    }
+  }
+
+  private Pager(Scanner scanner, Range pageRange, int pageSize, Consumer<PageEntry> entryHandler) {
+    this.scanner = scanner;
+    this.pageRange = pageRange;
+    this.pageSize = pageSize;
+    this.entryHandler = entryHandler;
+  }
+
+  public void read(Key startKey) {
+    if (pageRead.get() == true) {
+      throw new IllegalStateException("Pager.read() cannot be called twice");
+    }
+    scanner.setRange(new Range(startKey, pageRange.getEndKey()));
+    handleStart(scanner.iterator());
+  }
+
+  public void read(int pageNum) {
+    if (pageRead.get() == true) {
+      throw new IllegalStateException("Pager.read() cannot be called twice");
+    }
+    scanner.setRange(pageRange);
+    Iterator<Map.Entry<Key, Value>> iterator = scanner.iterator();
+    if (pageNum > 0) {
+      long skip = 0;
+      while (skip < (pageNum * pageSize)) {
+        iterator.next();
+        skip++;
+      }
+    }
+    handleStart(iterator);
+  }
+
+  private void handleStart(Iterator<Map.Entry<Key, Value>> iterator) {
+    long num = 0;
+    while (iterator.hasNext() && (num < (pageSize + 1))) {
+      Map.Entry<Key, Value> entry = iterator.next();
+      entryHandler.accept(new PageEntry(entry.getKey(), entry.getValue(), num == pageSize));
+      num++;
+    }
+  }
+
+  public static Pager build(Scanner scanner, Range pageRange, int pageSize,
+      Consumer<PageEntry> entryHandler) {
+    return new Pager(scanner, pageRange, pageSize, entryHandler);
+  }
+}
diff --git a/modules/core/src/test/java/webindex/core/DataConfigTest.java b/modules/core/src/test/java/webindex/core/WebIndexConfigTest.java
similarity index 87%
rename from modules/core/src/test/java/webindex/core/DataConfigTest.java
rename to modules/core/src/test/java/webindex/core/WebIndexConfigTest.java
index a92bcbd..2bb125e 100644
--- a/modules/core/src/test/java/webindex/core/DataConfigTest.java
+++ b/modules/core/src/test/java/webindex/core/WebIndexConfigTest.java
@@ -17,11 +17,11 @@
 import org.junit.Assert;
 import org.junit.Test;
 
-public class DataConfigTest {
+public class WebIndexConfigTest {
 
   @Test
   public void testBasic() throws Exception {
-    DataConfig config = DataConfig.load("../../conf/data.yml.example", false);
+    WebIndexConfig config = WebIndexConfig.load("../../conf/webindex.yml.example", false);
     Assert.assertEquals("webindex_search", config.accumuloIndexTable);
     Assert.assertEquals("webindex", config.fluoApp);
     Assert.assertEquals("/cc/temp", config.hdfsTempDir);
diff --git a/modules/core/src/test/java/webindex/core/models/URLTest.java b/modules/core/src/test/java/webindex/core/models/URLTest.java
index 89f5b61..e0eb45d 100644
--- a/modules/core/src/test/java/webindex/core/models/URLTest.java
+++ b/modules/core/src/test/java/webindex/core/models/URLTest.java
@@ -22,31 +22,28 @@
 public class URLTest {
 
   public static URL from(String rawUrl) {
-    return URL.from(rawUrl, host -> host, host -> true);
+    return URL.from(rawUrl);
   }
 
   public static String toID(String rawUrl) {
     return from(rawUrl).toPageID();
   }
 
-  public static boolean isValid(String rawUrl) {
-    return URL.isValid(rawUrl, host -> host, host -> true);
-  }
 
   public static URL url80(String host, String path) {
-    return new URL(host, host, path, 80, false, URL.isValidIP(host));
+    return new URL(URL.domainFromHost(host), host, path, 80, false, URL.isValidIP(host));
   }
 
   public static URL url443(String host, String path) {
-    return new URL(host, host, path, 443, true, URL.isValidIP(host));
+    return new URL(URL.domainFromHost(host), host, path, 443, true, URL.isValidIP(host));
   }
 
   public static URL urlOpen(String host, String path, int port) {
-    return new URL(host, host, path, port, false, URL.isValidIP(host));
+    return new URL(URL.domainFromHost(host), host, path, port, false, URL.isValidIP(host));
   }
 
   public static URL urlSecure(String host, String path, int port) {
-    return new URL(host, host, path, port, true, URL.isValidIP(host));
+    return new URL(URL.domainFromHost(host), host, path, port, true, URL.isValidIP(host));
   }
 
   @Test
@@ -57,7 +54,7 @@
             "http://ab.com#1/2/3", "https://ab.com/", "https://h.d.ab.com/1/2/3"};
 
     for (String rawUrl : validUrls) {
-      Assert.assertTrue(isValid(rawUrl));
+      Assert.assertTrue(URL.isValid(rawUrl));
       Assert.assertEquals(rawUrl, from(rawUrl).toString());
     }
 
@@ -67,7 +64,7 @@
             "http://a.com:"};
 
     for (String rawUrl : failureUrls) {
-      Assert.assertFalse(isValid(rawUrl));
+      Assert.assertFalse(URL.isValid(rawUrl));
     }
   }
 
@@ -100,8 +97,8 @@
     URL u = from("http://a.b.c.d.com/1/2/3");
     Assert.assertEquals("a.b.c.d.com", u.getHost());
     Assert.assertEquals("com.d.c.b.a", u.getReverseHost());
-    Assert.assertEquals("a.b.c.d.com", u.getDomain());
-    Assert.assertEquals("com.d.c.b.a", u.getReverseDomain());
+    Assert.assertEquals("d.com", u.getDomain());
+    Assert.assertEquals("com.d", u.getReverseDomain());
   }
 
   @Test
@@ -110,8 +107,6 @@
     Assert.assertEquals(urlOpen("example.com", "#a&b", 83), from("http://example.com:83#a&b"));
     Assert.assertEquals(url80("a.b.example.com", "/page?1&2"),
         from("http://a.b.example.com/page?1&2"));
-    Assert.assertEquals(url443("1.2.3.4", "/page?1&2"), from("https://1.2.3.4/page?1&2"));
-    Assert.assertEquals(url80("1.2.3.4", "/page?1&2"), from("http://1.2.3.4/page?1&2"));
     Assert.assertEquals(url80("example.com", "/1/2/3?c&d&e"),
         from("http://example.com/1/2/3?c&d&e"));
     Assert.assertEquals(url80("a.b.example.com", "/"), from("http://a.b.example.com"));
@@ -123,7 +118,6 @@
     Assert.assertEquals(url443("example.com", "/"), from("https://example.com/"));
     Assert.assertEquals(url80("example.com", "/b?1#2&3#4"), from("http://example.com/b?1#2&3#4"));
     Assert.assertEquals(urlOpen("example.com", "/b", 8080), from("http://example.com:8080/b"));
-    Assert.assertEquals(url80("1.2.3.4", "////c"), from("http://1.2.3.4////c"));
   }
 
   @Test
@@ -131,7 +125,7 @@
     URL u1 = urlSecure("a.b.c.com", "/", 8329);
     URL u2 = from("https://a.b.C.com:8329");
     String r1 = u2.toPageID();
-    Assert.assertEquals("com.c.b.a>>s8329>/", r1);
+    Assert.assertEquals("com.c>.b.a>s8329>/", r1);
     URL u3 = URL.fromPageID(r1);
     Assert.assertEquals(u1, u2);
     Assert.assertEquals(u1, u3);
@@ -147,6 +141,75 @@
     Assert.assertEquals("1.2.3.4>>o>/a/b/c", id5);
     Assert.assertEquals(u5, URL.fromPageID(id5));
 
-    Assert.assertEquals("com.b.a>>s80>/", from("https://a.b.com:80").toPageID());
+    Assert.assertEquals("com.b>.a>s80>/", from("https://a.b.com:80").toPageID());
+  }
+
+  @Test
+  public void testMore() throws Exception {
+
+    // valid urls
+    Assert.assertTrue(URL.isValid(" \thttp://example.com/ \t\n\r\n"));
+    Assert.assertTrue(URL.isValid("http://1.2.3.4:80/test?a=b&c=d"));
+    Assert.assertTrue(URL.isValid("http://1.2.3.4/"));
+    Assert.assertTrue(URL.isValid("http://a.b.c.d.com/1/2/3/4/5"));
+    Assert.assertTrue(URL.isValid("http://a.b.com:281/1/2"));
+    Assert.assertTrue(URL.isValid("http://A.B.Com:281/a/b"));
+    Assert.assertTrue(URL.isValid("http://A.b.Com:281/A/b"));
+    Assert.assertTrue(URL.isValid("http://a.B.Com?A/b/C"));
+    Assert.assertTrue(URL.isValid("http://A.Be.COM"));
+    Assert.assertTrue(URL.isValid("http://1.2.3.4:281/1/2"));
+
+    // invalid urls
+    Assert.assertFalse(URL.isValid("http://a.com:/test"));
+    Assert.assertFalse(URL.isValid("http://z.com:"));
+    Assert.assertFalse(URL.isValid("http://1.2.3:80/test?a=b&c=d"));
+    Assert.assertFalse(URL.isValid("http://1.2.3/"));
+    Assert.assertFalse(URL.isValid("http://com/"));
+    Assert.assertFalse(URL.isValid("http://a.b.c.com/bad>et"));
+    Assert.assertFalse(URL.isValid("http://test"));
+    Assert.assertFalse(URL.isValid("http://co.uk"));
+    Assert.assertFalse(URL.isValid("http:///example.com/"));
+    Assert.assertFalse(URL.isValid("http:://example.com/"));
+    Assert.assertFalse(URL.isValid("example.com"));
+    Assert.assertFalse(URL.isValid("127.0.0.1"));
+    Assert.assertFalse(URL.isValid("http://ab@example.com"));
+    Assert.assertFalse(URL.isValid("ftp://example.com"));
+
+    Assert.assertEquals("example.com", from("http://example.com:281/1/2").getHost());
+    Assert.assertEquals("a.b.example.com", from("http://a.b.example.com/1/2").getHost());
+    Assert.assertEquals("a.b.example.com", from("http://A.B.Example.Com/1/2").getHost());
+    Assert.assertEquals("1.2.3.4", from("http://1.2.3.4:89/1/2").getHost());
+
+    Assert.assertEquals("/A/b/C", from("http://A.B.Example.Com/A/b/C").getPath());
+    Assert.assertEquals("?D/E/f", from("http://A.B.Example.Com?D/E/f").getPath());
+
+    URL u = from("http://a.b.c.d.com/1/2/3");
+    Assert.assertEquals("a.b.c.d.com", u.getHost());
+    Assert.assertEquals("com.d.c.b.a", u.getReverseHost());
+    Assert.assertEquals("d.com", u.getDomain());
+    Assert.assertEquals("com.d", u.getReverseDomain());
+
+    Assert.assertEquals("com.example", from("http://example.com:281/1").getReverseHost());
+    Assert.assertEquals("com.example.b.a", from("http://a.b.example.com/1/2").getReverseHost());
+    Assert.assertEquals("1.2.3.4", from("http://1.2.3.4:89/1/2").getReverseHost());
+
+    Assert.assertTrue(from("http://a.com/a.jpg").isImage());
+    Assert.assertTrue(from("http://a.com/a.JPEG").isImage());
+    Assert.assertTrue(from("http://a.com/c/b/a.png").isImage());
+
+    Assert.assertEquals("c.com", from("http://a.b.c.com").getDomain());
+    Assert.assertEquals("com.c", from("http://a.b.c.com").getReverseDomain());
+    Assert.assertEquals("c.co.uk", from("http://a.b.c.co.uk").getDomain());
+    Assert.assertEquals("uk.co.c", from("http://a.b.c.co.uk").getReverseDomain());
+    Assert.assertEquals("d.com.au", from("http://www.d.com.au").getDomain());
+    Assert.assertEquals("au.com.d", from("http://www.d.com.au").getReverseDomain());
+
+    u = from("https://www.d.com.au:9443/a/bc");
+    Assert.assertEquals("au.com.d>.www>s9443>/a/bc", u.toPageID());
+    Assert.assertEquals("https://www.d.com.au:9443/a/bc", u.toString());
+    URL u2 = URL.fromPageID(u.toPageID());
+    Assert.assertEquals("https://www.d.com.au:9443/a/bc", u2.toString());
+    Assert.assertEquals("d.com.au", u2.getDomain());
+    Assert.assertEquals("www.d.com.au", u2.getHost());
   }
 }
diff --git a/modules/core/src/test/resources/log4j.properties b/modules/core/src/test/resources/log4j.properties
index ee43eea..7add931 100644
--- a/modules/core/src/test/resources/log4j.properties
+++ b/modules/core/src/test/resources/log4j.properties
@@ -18,4 +18,3 @@
 log4j.appender.CA.layout.ConversionPattern=%d{ISO8601} [%c] %-5p: %m%n
 
 log4j.logger.webindex=WARN
-log4j.logger.Remoting=WARN
diff --git a/modules/data/pom.xml b/modules/data/pom.xml
index 7dd70d2..df01114 100644
--- a/modules/data/pom.xml
+++ b/modules/data/pom.xml
@@ -32,7 +32,6 @@
     <dependency>
       <groupId>com.google.guava</groupId>
       <artifactId>guava</artifactId>
-      <version>14.0.1</version>
     </dependency>
     <dependency>
       <groupId>commons-io</groupId>
@@ -67,10 +66,6 @@
     </dependency>
     <dependency>
       <groupId>org.apache.fluo</groupId>
-      <artifactId>fluo-mapreduce</artifactId>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.fluo</groupId>
       <artifactId>fluo-recipes-accumulo</artifactId>
     </dependency>
     <dependency>
@@ -148,41 +143,47 @@
       <scope>test</scope>
     </dependency>
   </dependencies>
-  <build>
-    <plugins>
-      <plugin>
-        <groupId>org.apache.maven.plugins</groupId>
-        <artifactId>maven-shade-plugin</artifactId>
-        <executions>
-          <execution>
-            <goals>
-              <goal>shade</goal>
-            </goals>
-            <phase>package</phase>
-            <configuration>
-              <shadedArtifactAttached>true</shadedArtifactAttached>
-              <shadedClassifierName>shaded</shadedClassifierName>
-              <!-- Relocate Thrift because Accumulo 1.8 uses Thrift 0.9.3 and Spark uses 0.9.1.  -->
-              <relocations>
-                <relocation>
-                  <pattern>org.apache.thrift</pattern>
-                  <shadedPattern>webindex.org.apache.thrift</shadedPattern>
-                </relocation>
-              </relocations>
-              <filters>
-                <filter>
-                  <artifact>*:*</artifact>
-                  <excludes>
-                    <exclude>META-INF/*.SF</exclude>
-                    <exclude>META-INF/*.DSA</exclude>
-                    <exclude>META-INF/*.RSA</exclude>
-                  </excludes>
-                </filter>
-              </filters>
-            </configuration>
-          </execution>
-        </executions>
-      </plugin>
-    </plugins>
-  </build>
+  <profiles>
+    <profile>
+      <id>create-shade-jar</id>
+      <build>
+        <plugins>
+          <plugin>
+            <groupId>org.apache.maven.plugins</groupId>
+            <artifactId>maven-shade-plugin</artifactId>
+            <executions>
+              <execution>
+                <id>spark-shade-jar</id>
+                <goals>
+                  <goal>shade</goal>
+                </goals>
+                <phase>package</phase>
+                <configuration>
+                  <shadedArtifactAttached>true</shadedArtifactAttached>
+                  <shadedClassifierName>shaded</shadedClassifierName>
+                  <!-- Relocate Thrift because Accumulo 1.8 uses Thrift 0.9.3 and Spark uses 0.9.1.  -->
+                  <relocations>
+                    <relocation>
+                      <pattern>org.apache.thrift</pattern>
+                      <shadedPattern>webindex.org.apache.thrift</shadedPattern>
+                    </relocation>
+                  </relocations>
+                  <filters>
+                    <filter>
+                      <artifact>*:*</artifact>
+                      <excludes>
+                        <exclude>META-INF/*.SF</exclude>
+                        <exclude>META-INF/*.DSA</exclude>
+                        <exclude>META-INF/*.RSA</exclude>
+                      </excludes>
+                    </filter>
+                  </filters>
+                </configuration>
+              </execution>
+            </executions>
+          </plugin>
+        </plugins>
+      </build>
+    </profile>
+  </profiles>
 </project>
diff --git a/modules/data/src/main/java/webindex/data/Configure.java b/modules/data/src/main/java/webindex/data/Configure.java
index 0dd0789..11b0f5b 100644
--- a/modules/data/src/main/java/webindex/data/Configure.java
+++ b/modules/data/src/main/java/webindex/data/Configure.java
@@ -22,7 +22,7 @@
 import org.apache.fluo.api.config.FluoConfiguration;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import webindex.core.DataConfig;
+import webindex.core.WebIndexConfig;
 import webindex.data.spark.IndexEnv;
 
 public class Configure {
@@ -35,9 +35,9 @@
       log.error("Usage: Configure");
       System.exit(1);
     }
-    DataConfig dataConfig = DataConfig.load();
+    WebIndexConfig webIndexConfig = WebIndexConfig.load();
 
-    IndexEnv env = new IndexEnv(dataConfig);
+    IndexEnv env = new IndexEnv(webIndexConfig);
     env.initAccumuloIndexTable();
 
     FluoConfiguration appConfig = new FluoConfiguration();
@@ -45,7 +45,7 @@
 
     Iterator<String> iter = appConfig.getKeys();
     try (PrintWriter out =
-        new PrintWriter(new BufferedWriter(new FileWriter(dataConfig.getFluoPropsPath(), true)))) {
+        new PrintWriter(new BufferedWriter(new FileWriter(webIndexConfig.getFluoPropsPath(), true)))) {
       while (iter.hasNext()) {
         String key = iter.next();
         out.println(key + " = " + appConfig.getRawString(key));
diff --git a/modules/data/src/main/java/webindex/data/Copy.java b/modules/data/src/main/java/webindex/data/Copy.java
index 7976feb..4e71908 100644
--- a/modules/data/src/main/java/webindex/data/Copy.java
+++ b/modules/data/src/main/java/webindex/data/Copy.java
@@ -28,7 +28,7 @@
 import org.apache.spark.api.java.JavaSparkContext;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import webindex.core.DataConfig;
+import webindex.core.WebIndexConfig;
 import webindex.data.spark.IndexEnv;
 
 public class Copy {
@@ -56,7 +56,7 @@
       System.exit(1);
     }
 
-    DataConfig dataConfig = DataConfig.load();
+    WebIndexConfig webIndexConfig = WebIndexConfig.load();
 
     SparkConf sparkConf = new SparkConf().setAppName("webindex-copy");
     try (JavaSparkContext ctx = new JavaSparkContext(sparkConf)) {
@@ -70,9 +70,9 @@
       log.info("Copying {} files (Range {} of paths file {}) from AWS to HDFS {}", copyList.size(),
           args[1], args[0], destPath.toString());
 
-      JavaRDD<String> copyRDD = ctx.parallelize(copyList, dataConfig.getNumExecutorInstances());
+      JavaRDD<String> copyRDD = ctx.parallelize(copyList, webIndexConfig.getNumExecutorInstances());
 
-      final String prefix = DataConfig.CC_URL_PREFIX;
+      final String prefix = WebIndexConfig.CC_URL_PREFIX;
       final String destDir = destPath.toString();
 
       copyRDD
diff --git a/modules/data/src/main/java/webindex/data/Init.java b/modules/data/src/main/java/webindex/data/Init.java
index 1caa164..a3c7f4c 100644
--- a/modules/data/src/main/java/webindex/data/Init.java
+++ b/modules/data/src/main/java/webindex/data/Init.java
@@ -23,7 +23,7 @@
 import org.archive.io.ArchiveReader;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import webindex.core.DataConfig;
+import webindex.core.WebIndexConfig;
 import webindex.core.models.Page;
 import webindex.data.spark.IndexEnv;
 import webindex.data.spark.IndexStats;
@@ -40,9 +40,9 @@
       log.error("Usage: Init [<dataDir>]");
       System.exit(1);
     }
-    DataConfig dataConfig = DataConfig.load();
+    WebIndexConfig webIndexConfig = WebIndexConfig.load();
 
-    IndexEnv env = new IndexEnv(dataConfig);
+    IndexEnv env = new IndexEnv(webIndexConfig);
     env.setFluoTableSplits();
     log.info("Initialized Fluo table splits");
 
diff --git a/modules/data/src/main/java/webindex/data/LoadHdfs.java b/modules/data/src/main/java/webindex/data/LoadHdfs.java
index 87d590b..91eb17f 100644
--- a/modules/data/src/main/java/webindex/data/LoadHdfs.java
+++ b/modules/data/src/main/java/webindex/data/LoadHdfs.java
@@ -37,7 +37,7 @@
 import org.archive.io.warc.WARCReaderFactory;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import webindex.core.DataConfig;
+import webindex.core.WebIndexConfig;
 import webindex.core.models.Page;
 import webindex.data.fluo.PageLoader;
 import webindex.data.spark.IndexEnv;
@@ -57,7 +57,7 @@
     IndexEnv.validateDataDir(dataDir);
 
     final String hadoopConfDir = IndexEnv.getHadoopConfDir();
-    final int rateLimit = DataConfig.load().getLoadRateLimit();
+    final int rateLimit = WebIndexConfig.load().getLoadRateLimit();
 
     List<String> loadPaths = new ArrayList<>();
     FileSystem hdfs = IndexEnv.getHDFS();
diff --git a/modules/data/src/main/java/webindex/data/LoadS3.java b/modules/data/src/main/java/webindex/data/LoadS3.java
index e16c69e..267e160 100644
--- a/modules/data/src/main/java/webindex/data/LoadS3.java
+++ b/modules/data/src/main/java/webindex/data/LoadS3.java
@@ -31,7 +31,7 @@
 import org.archive.io.warc.WARCReaderFactory;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import webindex.core.DataConfig;
+import webindex.core.WebIndexConfig;
 import webindex.core.models.Page;
 import webindex.data.fluo.PageLoader;
 import webindex.data.spark.IndexEnv;
@@ -53,7 +53,7 @@
       System.exit(1);
     }
 
-    final int rateLimit = DataConfig.load().getLoadRateLimit();
+    final int rateLimit = WebIndexConfig.load().getLoadRateLimit();
 
     SparkConf sparkConf = new SparkConf().setAppName("webindex-load-s3");
     try (JavaSparkContext ctx = new JavaSparkContext(sparkConf)) {
@@ -63,7 +63,7 @@
 
       JavaRDD<String> loadRDD = ctx.parallelize(loadList, loadList.size());
 
-      final String prefix = DataConfig.CC_URL_PREFIX;
+      final String prefix = WebIndexConfig.CC_URL_PREFIX;
 
       loadRDD.foreachPartition(iter -> {
         final FluoConfiguration fluoConfig = new FluoConfiguration(new File("fluo.properties"));
diff --git a/modules/data/src/main/java/webindex/data/TestParser.java b/modules/data/src/main/java/webindex/data/TestParser.java
index 318db31..30fb024 100644
--- a/modules/data/src/main/java/webindex/data/TestParser.java
+++ b/modules/data/src/main/java/webindex/data/TestParser.java
@@ -25,7 +25,7 @@
 import org.archive.io.warc.WARCReaderFactory;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import webindex.core.DataConfig;
+import webindex.core.WebIndexConfig;
 import webindex.data.spark.IndexEnv;
 import webindex.data.util.ArchiveUtil;
 
@@ -45,7 +45,7 @@
       System.exit(1);
     }
 
-    DataConfig.load();
+    WebIndexConfig.load();
 
     SparkConf sparkConf = new SparkConf().setAppName("webindex-test-parser");
     try (JavaSparkContext ctx = new JavaSparkContext(sparkConf)) {
@@ -55,7 +55,7 @@
 
       JavaRDD<String> loadRDD = ctx.parallelize(loadList, loadList.size());
 
-      final String prefix = DataConfig.CC_URL_PREFIX;
+      final String prefix = WebIndexConfig.CC_URL_PREFIX;
 
       loadRDD.foreachPartition(iter -> iter.forEachRemaining(path -> {
         String urlToCopy = prefix + path;
diff --git a/modules/data/src/main/java/webindex/data/spark/IndexEnv.java b/modules/data/src/main/java/webindex/data/spark/IndexEnv.java
index 4ee9277..ec1dda8 100644
--- a/modules/data/src/main/java/webindex/data/spark/IndexEnv.java
+++ b/modules/data/src/main/java/webindex/data/spark/IndexEnv.java
@@ -54,7 +54,7 @@
 import org.apache.spark.api.java.JavaSparkContext;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import webindex.core.DataConfig;
+import webindex.core.WebIndexConfig;
 import webindex.core.models.Page;
 import webindex.data.FluoApp;
 import webindex.data.fluo.PageObserver;
@@ -72,9 +72,9 @@
   private int numTablets;
   private int numBuckets;
 
-  public IndexEnv(DataConfig dataConfig) {
-    this(getFluoConfig(dataConfig), dataConfig.accumuloIndexTable, dataConfig.hdfsTempDir,
-        dataConfig.numBuckets, dataConfig.numTablets);
+  public IndexEnv(WebIndexConfig webIndexConfig) {
+    this(getFluoConfig(webIndexConfig), webIndexConfig.accumuloIndexTable,
+        webIndexConfig.hdfsTempDir, webIndexConfig.numBuckets, webIndexConfig.numTablets);
   }
 
   public IndexEnv(FluoConfiguration fluoConfig, String accumuloTable, String hdfsTempDir,
@@ -101,10 +101,10 @@
     return hadoopConfDir;
   }
 
-  private static FluoConfiguration getFluoConfig(DataConfig dataConfig) {
-    Preconditions.checkArgument(new File(dataConfig.getFluoPropsPath()).exists(),
-        "fluoPropsPath must be set in data.yml and exist");
-    return new FluoConfiguration(new File(dataConfig.getFluoPropsPath()));
+  private static FluoConfiguration getFluoConfig(WebIndexConfig webIndexConfig) {
+    Preconditions.checkArgument(new File(webIndexConfig.getFluoPropsPath()).exists(),
+        "fluoPropsPath must be set in webindex.yml and exist");
+    return new FluoConfiguration(new File(webIndexConfig.getFluoPropsPath()));
   }
 
   public FluoConfiguration getFluoConfig() {
diff --git a/modules/data/src/main/java/webindex/data/util/ArchiveUtil.java b/modules/data/src/main/java/webindex/data/util/ArchiveUtil.java
index 212103a..6ce5118 100644
--- a/modules/data/src/main/java/webindex/data/util/ArchiveUtil.java
+++ b/modules/data/src/main/java/webindex/data/util/ArchiveUtil.java
@@ -51,7 +51,7 @@
       String rawPageUrl = archiveRecord.getHeader().getUrl();
       URL pageUrl;
       try {
-        pageUrl = DataUrl.from(rawPageUrl);
+        pageUrl = URL.from(rawPageUrl);
       } catch (IllegalArgumentException e) {
         return Page.EMPTY;
       } catch (Exception e) {
@@ -80,7 +80,7 @@
                 String rawLinkUrl = link.getString("url");
                 URL linkUrl;
                 try {
-                  linkUrl = DataUrl.from(rawLinkUrl);
+                  linkUrl = URL.from(rawLinkUrl);
                   if (!page.getDomain().equals(linkUrl.getDomain())) {
                     page.addOutbound(Link.of(linkUrl, anchorText));
                   }
diff --git a/modules/data/src/main/java/webindex/data/util/DataUrl.java b/modules/data/src/main/java/webindex/data/util/DataUrl.java
deleted file mode 100644
index 695b16f..0000000
--- a/modules/data/src/main/java/webindex/data/util/DataUrl.java
+++ /dev/null
@@ -1,39 +0,0 @@
-/*
- * Copyright 2016 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.data.util;
-
-import com.google.common.net.HostSpecifier;
-import com.google.common.net.InternetDomainName;
-import webindex.core.models.URL;
-
-public class DataUrl {
-
-  public static String domainFromHost(String host) {
-    return InternetDomainName.from(host).topPrivateDomain().name();
-  }
-
-  public static boolean isValidHost(String host) {
-    return HostSpecifier.isValid(host) && InternetDomainName.isValid(host)
-        && InternetDomainName.from(host).isUnderPublicSuffix();
-  }
-
-  public static URL from(String rawUrl) {
-    return URL.from(rawUrl, DataUrl::domainFromHost, DataUrl::isValidHost);
-  }
-
-  public static boolean isValid(String rawUrl) {
-    return URL.isValid(rawUrl, DataUrl::domainFromHost, DataUrl::isValidHost);
-  }
-}
diff --git a/modules/data/src/test/java/webindex/data/fluo/it/IndexIT.java b/modules/data/src/test/java/webindex/data/fluo/it/IndexIT.java
index 7c64004..6b29fec 100644
--- a/modules/data/src/test/java/webindex/data/fluo/it/IndexIT.java
+++ b/modules/data/src/test/java/webindex/data/fluo/it/IndexIT.java
@@ -54,7 +54,6 @@
 import webindex.data.spark.IndexStats;
 import webindex.data.spark.IndexUtil;
 import webindex.data.util.ArchiveUtil;
-import webindex.data.util.DataUrl;
 
 public class IndexIT extends AccumuloExportITBase {
 
@@ -136,11 +135,11 @@
   }
 
   public static Link newLink(String url) {
-    return Link.of(DataUrl.from(url));
+    return Link.of(URL.from(url));
   }
 
   public static Link newLink(String url, String anchorText) {
-    return Link.of(DataUrl.from(url), anchorText);
+    return Link.of(URL.from(url), anchorText);
   }
 
   @Test
@@ -160,8 +159,8 @@
       getMiniFluo().waitForObservers();
       assertOutput(pages.values());
 
-      URL deleteUrl = DataUrl.from("http://1000games.me/games/gametion/");
-      log.info("Deleting page {}", deleteUrl);
+      URL deleteUrl = URL.from("http://1000games.me/games/gametion/");
+      log.debug("Deleting page {}", deleteUrl);
       try (LoaderExecutor le = client.newLoaderExecutor()) {
         le.execute(PageLoader.deletePage(deleteUrl));
       }
@@ -172,7 +171,7 @@
       Assert.assertEquals(numPages - 1, pages.size());
       assertOutput(pages.values());
 
-      URL updateUrl = DataUrl.from("http://100zone.blogspot.com/2013/03/please-memp3-4shared.html");
+      URL updateUrl = URL.from("http://100zone.blogspot.com/2013/03/please-memp3-4shared.html");
       Page updatePage = pages.get(updateUrl);
       long numLinks = updatePage.getNumOutbound();
       Assert.assertTrue(updatePage.addOutbound(newLink("http://example.com", "Example")));
@@ -186,7 +185,7 @@
       getMiniFluo().waitForObservers();
 
       // create a URL that has an inlink count of 2
-      URL updateUrl2 = DataUrl.from("http://00assclown.newgrounds.com/");
+      URL updateUrl2 = URL.from("http://00assclown.newgrounds.com/");
       Page updatePage2 = pages.get(updateUrl2);
       long numLinks2 = updatePage2.getNumOutbound();
       Assert.assertTrue(updatePage2.addOutbound(newLink("http://example.com", "Example")));
@@ -237,8 +236,7 @@
     try (FluoClient client = FluoFactory.newClient(getMiniFluo().getClientConfiguration());
         LoaderExecutor le = client.newLoaderExecutor()) {
       for (Page page : pages.subList(2, pages.size())) {
-        log.info("Loading page {} with {} links {}", page.getUrl(), page.getOutboundLinks().size(),
-            page.getOutboundLinks());
+        log.debug("Loading page {} with {} links", page.getUrl(), page.getOutboundLinks().size());
         le.execute(PageLoader.updatePage(page));
       }
     }
diff --git a/modules/data/src/test/java/webindex/data/spark/IndexUtilTest.java b/modules/data/src/test/java/webindex/data/spark/IndexUtilTest.java
index fc72bb6..75ea240 100644
--- a/modules/data/src/test/java/webindex/data/spark/IndexUtilTest.java
+++ b/modules/data/src/test/java/webindex/data/spark/IndexUtilTest.java
@@ -33,9 +33,9 @@
 import scala.Tuple2;
 import webindex.core.models.Link;
 import webindex.core.models.Page;
+import webindex.core.models.URL;
 import webindex.data.SparkTestUtil;
 import webindex.data.fluo.UriMap.UriInfo;
-import webindex.data.util.DataUrl;
 
 public class IndexUtilTest {
 
@@ -106,14 +106,14 @@
 
   private List<Page> getPagesSet1() {
     List<Page> pages = new ArrayList<>();
-    Page pageA = new Page(DataUrl.from("http://a.com/1").toPageID());
-    pageA.addOutbound(Link.of(DataUrl.from("http://b.com/1"), "b1"));
-    pageA.addOutbound(Link.of(DataUrl.from("http://b.com/3"), "b3"));
-    pageA.addOutbound(Link.of(DataUrl.from("http://c.com/1"), "c1"));
-    Page pageB = new Page(DataUrl.from("http://b.com").toPageID());
-    pageB.addOutbound(Link.of(DataUrl.from("http://c.com/1"), "c1"));
-    pageB.addOutbound(Link.of(DataUrl.from("http://b.com/2"), "b2"));
-    pageB.addOutbound(Link.of(DataUrl.from("http://b.com/3"), "b3"));
+    Page pageA = new Page(URL.from("http://a.com/1").toPageID());
+    pageA.addOutbound(Link.of(URL.from("http://b.com/1"), "b1"));
+    pageA.addOutbound(Link.of(URL.from("http://b.com/3"), "b3"));
+    pageA.addOutbound(Link.of(URL.from("http://c.com/1"), "c1"));
+    Page pageB = new Page(URL.from("http://b.com").toPageID());
+    pageB.addOutbound(Link.of(URL.from("http://c.com/1"), "c1"));
+    pageB.addOutbound(Link.of(URL.from("http://b.com/2"), "b2"));
+    pageB.addOutbound(Link.of(URL.from("http://b.com/3"), "b3"));
     pages.add(pageA);
     pages.add(pageB);
     return pages;
diff --git a/modules/data/src/test/java/webindex/data/util/DataUrlTest.java b/modules/data/src/test/java/webindex/data/util/DataUrlTest.java
deleted file mode 100644
index d312b42..0000000
--- a/modules/data/src/test/java/webindex/data/util/DataUrlTest.java
+++ /dev/null
@@ -1,99 +0,0 @@
-/*
- * Copyright 2016 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.data.util;
-
-import org.junit.Assert;
-import org.junit.Test;
-import webindex.core.models.URL;
-
-public class DataUrlTest {
-
-  public static URL build(String rawUrl) {
-    return DataUrl.from(rawUrl);
-  }
-
-  public static boolean isValid(String rawUrl) {
-    return DataUrl.isValid(rawUrl);
-  }
-
-  @Test
-  public void testBasic() throws Exception {
-
-    // valid urls
-    Assert.assertTrue(isValid(" \thttp://example.com/ \t\n\r\n"));
-    Assert.assertTrue(isValid("http://1.2.3.4:80/test?a=b&c=d"));
-    Assert.assertTrue(isValid("http://1.2.3.4/"));
-    Assert.assertTrue(isValid("http://a.b.c.d.com/1/2/3/4/5"));
-    Assert.assertTrue(isValid("http://a.b.com:281/1/2"));
-    Assert.assertTrue(isValid("http://A.B.Com:281/a/b"));
-    Assert.assertTrue(isValid("http://A.b.Com:281/A/b"));
-    Assert.assertTrue(isValid("http://a.B.Com?A/b/C"));
-    Assert.assertTrue(isValid("http://A.Be.COM"));
-    Assert.assertTrue(isValid("http://1.2.3.4:281/1/2"));
-
-    // invalid urls
-    Assert.assertFalse(isValid("http://a.com:/test"));
-    Assert.assertFalse(isValid("http://z.com:"));
-    Assert.assertFalse(isValid("http://1.2.3:80/test?a=b&c=d"));
-    Assert.assertFalse(isValid("http://1.2.3/"));
-    Assert.assertFalse(isValid("http://com/"));
-    Assert.assertFalse(isValid("http://a.b.c.com/bad>et"));
-    Assert.assertFalse(isValid("http://test"));
-    Assert.assertFalse(isValid("http://co.uk"));
-    Assert.assertFalse(isValid("http:///example.com/"));
-    Assert.assertFalse(isValid("http:://example.com/"));
-    Assert.assertFalse(isValid("example.com"));
-    Assert.assertFalse(isValid("127.0.0.1"));
-    Assert.assertFalse(isValid("http://ab@example.com"));
-    Assert.assertFalse(isValid("ftp://example.com"));
-
-    Assert.assertEquals("example.com", build("http://example.com:281/1/2").getHost());
-    Assert.assertEquals("a.b.example.com", build("http://a.b.example.com/1/2").getHost());
-    Assert.assertEquals("a.b.example.com", build("http://A.B.Example.Com/1/2").getHost());
-    Assert.assertEquals("1.2.3.4", build("http://1.2.3.4:89/1/2").getHost());
-
-    Assert.assertEquals("/A/b/C", build("http://A.B.Example.Com/A/b/C").getPath());
-    Assert.assertEquals("?D/E/f", build("http://A.B.Example.Com?D/E/f").getPath());
-
-    URL u = build("http://a.b.c.d.com/1/2/3");
-    Assert.assertEquals("a.b.c.d.com", u.getHost());
-    Assert.assertEquals("com.d.c.b.a", u.getReverseHost());
-    Assert.assertEquals("d.com", u.getDomain());
-    Assert.assertEquals("com.d", u.getReverseDomain());
-
-    Assert.assertEquals("com.example", build("http://example.com:281/1").getReverseHost());
-    Assert.assertEquals("com.example.b.a", build("http://a.b.example.com/1/2").getReverseHost());
-    Assert.assertEquals("1.2.3.4", build("http://1.2.3.4:89/1/2").getReverseHost());
-
-    Assert.assertTrue(build("http://a.com/a.jpg").isImage());
-    Assert.assertTrue(build("http://a.com/a.JPEG").isImage());
-    Assert.assertTrue(build("http://a.com/c/b/a.png").isImage());
-
-    Assert.assertEquals("c.com", build("http://a.b.c.com").getDomain());
-    Assert.assertEquals("com.c", build("http://a.b.c.com").getReverseDomain());
-    Assert.assertEquals("c.co.uk", build("http://a.b.c.co.uk").getDomain());
-    Assert.assertEquals("uk.co.c", build("http://a.b.c.co.uk").getReverseDomain());
-    Assert.assertEquals("d.com.au", build("http://www.d.com.au").getDomain());
-    Assert.assertEquals("au.com.d", build("http://www.d.com.au").getReverseDomain());
-
-    u = build("https://www.d.com.au:9443/a/bc");
-    Assert.assertEquals("au.com.d>.www>s9443>/a/bc", u.toPageID());
-    Assert.assertEquals("https://www.d.com.au:9443/a/bc", u.toString());
-    URL u2 = URL.fromPageID(u.toPageID());
-    Assert.assertEquals("https://www.d.com.au:9443/a/bc", u2.toString());
-    Assert.assertEquals("d.com.au", u2.getDomain());
-    Assert.assertEquals("www.d.com.au", u2.getHost());
-  }
-}
diff --git a/modules/data/src/test/resources/log4j.properties b/modules/data/src/test/resources/log4j.properties
index 9a981d1..c80a759 100644
--- a/modules/data/src/test/resources/log4j.properties
+++ b/modules/data/src/test/resources/log4j.properties
@@ -26,6 +26,7 @@
 log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
 log4j.logger.org.apache.spark=WARN
 log4j.logger.org.apache.zookeeper=WARN
+log4j.logger.org.apache.zookeeper.ClientCnxn=ERROR
 log4j.logger.org.spark-project=WARN
 log4j.logger.webindex=WARN
 log4j.logger.Remoting=WARN
diff --git a/modules/integration/pom.xml b/modules/integration/pom.xml
new file mode 100644
index 0000000..890489f
--- /dev/null
+++ b/modules/integration/pom.xml
@@ -0,0 +1,138 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2015 Webindex authors (see AUTHORS)
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+  <parent>
+    <groupId>io.github.astralway</groupId>
+    <artifactId>webindex-parent</artifactId>
+    <version>0.0.1-SNAPSHOT</version>
+    <relativePath>../../pom.xml</relativePath>
+  </parent>
+  <artifactId>webindex-integration</artifactId>
+  <name>WebIndex Integration</name>
+  <dependencies>
+    <dependency>
+      <groupId>com.google.code.gson</groupId>
+      <artifactId>gson</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>com.sparkjava</groupId>
+      <artifactId>spark-core</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>io.github.astralway</groupId>
+      <artifactId>webindex-core</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>io.github.astralway</groupId>
+      <artifactId>webindex-data</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>asm</groupId>
+          <artifactId>asm</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+    <dependency>
+      <groupId>io.github.astralway</groupId>
+      <artifactId>webindex-ui</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.accumulo</groupId>
+      <artifactId>accumulo-minicluster</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.eclipse.jetty</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-api</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-core</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-mini</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.fluo</groupId>
+      <artifactId>fluo-recipes-test</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.netpreserve.commons</groupId>
+      <artifactId>webarchive-commons</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>ch.qos.logback</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+    <dependency>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-api</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-log4j12</artifactId>
+    </dependency>
+    <!-- Test dependencies -->
+    <dependency>
+      <groupId>junit</groupId>
+      <artifactId>junit</artifactId>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
+      <groupId>org.jsoup</groupId>
+      <artifactId>jsoup</artifactId>
+      <scope>test</scope>
+    </dependency>
+  </dependencies>
+  <profiles>
+    <profile>
+      <id>webindex-dev-server</id>
+      <build>
+        <plugins>
+          <plugin>
+            <groupId>org.codehaus.mojo</groupId>
+            <artifactId>exec-maven-plugin</artifactId>
+            <executions>
+              <execution>
+                <goals>
+                  <goal>java</goal>
+                </goals>
+                <phase>compile</phase>
+                <configuration>
+                  <mainClass>webindex.integration.DevServer</mainClass>
+                </configuration>
+              </execution>
+            </executions>
+          </plugin>
+        </plugins>
+      </build>
+    </profile>
+  </profiles>
+</project>
diff --git a/modules/integration/src/main/java/webindex/integration/DevServer.java b/modules/integration/src/main/java/webindex/integration/DevServer.java
new file mode 100644
index 0000000..4599e6f
--- /dev/null
+++ b/modules/integration/src/main/java/webindex/integration/DevServer.java
@@ -0,0 +1,159 @@
+/*
+ * Copyright 2015 Webindex authors (see AUTHORS)
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
+ * in compliance with the License. You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+ * or implied. See the License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package webindex.integration;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.google.gson.Gson;
+import org.apache.accumulo.minicluster.MiniAccumuloCluster;
+import org.apache.accumulo.minicluster.MiniAccumuloConfig;
+import org.apache.fluo.api.client.FluoAdmin;
+import org.apache.fluo.api.client.FluoClient;
+import org.apache.fluo.api.client.FluoFactory;
+import org.apache.fluo.api.client.LoaderExecutor;
+import org.apache.fluo.api.config.FluoConfiguration;
+import org.apache.fluo.api.mini.MiniFluo;
+import org.apache.fluo.recipes.test.AccumuloExportITBase;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import webindex.core.IndexClient;
+import webindex.core.models.Page;
+import webindex.data.fluo.PageLoader;
+import webindex.data.spark.IndexEnv;
+import webindex.ui.WebServer;
+
+public class DevServer {
+
+  private static final Logger log = LoggerFactory.getLogger(DevServer.class);
+  private static final int TEST_SPLITS = 119;
+
+  private Path dataPath;
+  private int webPort;
+  private Path templatePath;
+  private MiniAccumuloCluster cluster;
+  private WebServer webServer;
+  private IndexClient client;
+  private AtomicBoolean started = new AtomicBoolean(false);
+  private Path baseDir;
+
+  public DevServer(Path dataPath, int webPort, Path templatePath, Path baseDir) {
+    this.dataPath = dataPath;
+    this.webPort = webPort;
+    this.templatePath = templatePath;
+    this.baseDir = baseDir;
+    this.webServer = new WebServer();
+  }
+
+  public IndexClient getIndexClient() {
+    if (!started.get()) {
+      throw new IllegalStateException("DevServer must be started before retrieving index client");
+    }
+    return client;
+  }
+
+  public void start() throws Exception {
+    log.info("Starting WebIndex development server...");
+
+    log.info("Starting MiniAccumuloCluster at {}", baseDir);
+
+    MiniAccumuloConfig cfg = new MiniAccumuloConfig(baseDir.toFile(), "secret");
+    cluster = new MiniAccumuloCluster(cfg);
+    cluster.start();
+
+    FluoConfiguration config = new FluoConfiguration();
+    AccumuloExportITBase.configureFromMAC(config, cluster);
+    config.setApplicationName("webindex-dev");
+    config.setAccumuloTable("webindex");
+
+    String exportTable = "webindex_search";
+
+    log.info("Initializing Accumulo & Fluo");
+    IndexEnv env = new IndexEnv(config, exportTable, "/tmp", TEST_SPLITS, TEST_SPLITS);
+    env.initAccumuloIndexTable();
+    env.configureApplication(config);
+
+    FluoFactory.newAdmin(config).initialize(
+        new FluoAdmin.InitializationOptions().setClearTable(true).setClearZookeeper(true));
+
+    env.setFluoTableSplits();
+
+    log.info("Starting web server");
+    client = new IndexClient(exportTable, cluster.getConnector("root", "secret"));
+    webServer.start(client, webPort, templatePath);
+
+    log.info("Loading data from {}", dataPath);
+    Gson gson = new Gson();
+    try (MiniFluo miniFluo = FluoFactory.newMiniFluo(config);
+        FluoClient client = FluoFactory.newClient(miniFluo.getClientConfiguration())) {
+
+      try (LoaderExecutor le = client.newLoaderExecutor()) {
+
+        Files
+            .lines(dataPath)
+            .map(json -> Page.fromJson(gson, json))
+            .forEach(
+                page -> {
+                  log.debug("Loading page {} with {} links", page.getUrl(), page.getOutboundLinks()
+                      .size());
+                  le.execute(PageLoader.updatePage(page));
+                });
+      }
+
+      log.info("Finished loading data. Waiting for observers to finish...");
+      miniFluo.waitForObservers();
+      log.info("Observers finished");
+    }
+
+    started.set(true);
+  }
+
+  public void stop() {
+    webServer.stop();
+    try {
+      cluster.stop();
+    } catch (Exception e) {
+      throw new IllegalStateException(e);
+    }
+  }
+
+  public static void main(String[] args) throws Exception {
+    String dataLocation = "data/1K-pages.txt";
+    String templateLocation = "modules/ui/src/main/resources/spark/template/freemarker";
+    if (args.length == 2) {
+      dataLocation = args[0];
+      templateLocation = args[1];
+    }
+    log.info("Looking for data at {}", dataLocation);
+
+    Path dataPath = Paths.get(dataLocation);
+    if (Files.notExists(dataPath)) {
+      log.info("Generating sample data at {} for dev server", dataPath);
+      SampleData.generate(dataPath, 1000);
+    }
+
+    Path templatePath = Paths.get(templateLocation);
+    if (Files.notExists(templatePath)) {
+      log.info("Template location {} does not exits", templateLocation);
+      throw new IllegalArgumentException("Template location does not exist");
+    }
+
+    DevServer devServer =
+        new DevServer(dataPath, 4567, templatePath, Files.createTempDirectory("webindex-dev-"));
+    devServer.start();
+  }
+}
diff --git a/modules/integration/src/main/java/webindex/integration/SampleData.java b/modules/integration/src/main/java/webindex/integration/SampleData.java
new file mode 100644
index 0000000..03972ff
--- /dev/null
+++ b/modules/integration/src/main/java/webindex/integration/SampleData.java
@@ -0,0 +1,63 @@
+/*
+ * Copyright 2015 Webindex authors (see AUTHORS)
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
+ * in compliance with the License. You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+ * or implied. See the License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package webindex.integration;
+
+import java.net.URL;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.List;
+
+import com.google.gson.Gson;
+import org.archive.io.ArchiveReader;
+import org.archive.io.ArchiveRecord;
+import org.archive.io.warc.WARCReaderFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import webindex.core.models.Page;
+import webindex.data.util.ArchiveUtil;
+
+public class SampleData {
+
+  private static final Logger log = LoggerFactory.getLogger(SampleData.class);
+
+  private static final String sourceURL = "https://commoncrawl.s3.amazonaws.com/crawl-data/"
+      + "CC-MAIN-2015-32/segments/1438042981460.12/wat/"
+      + "CC-MAIN-20150728002301-00043-ip-10-236-191-2.ec2.internal.warc.wat.gz";
+
+  public static void generate(Path path, int numPages) throws Exception {
+
+    Gson gson = new Gson();
+    long count = 0;
+    List<String> pages = new ArrayList<>();
+    ArchiveReader ar = WARCReaderFactory.get(new URL(sourceURL), 0);
+    for (ArchiveRecord r : ar) {
+      Page p = ArchiveUtil.buildPage(r);
+      if (p.isEmpty() || p.getOutboundLinks().isEmpty()) {
+        log.debug("Skipping {}", p.getUrl());
+        continue;
+      }
+      log.debug("Found {} {}", p.getUrl(), p.getNumOutbound());
+      String json = gson.toJson(p);
+      pages.add(json);
+      count++;
+      if (count == numPages) {
+        break;
+      }
+    }
+    Files.write(path, pages);
+    log.info("Wrote {} pages to {}", numPages, path);
+  }
+}
diff --git a/modules/integration/src/test/java/webindex/integration/DevServerIT.java b/modules/integration/src/test/java/webindex/integration/DevServerIT.java
new file mode 100644
index 0000000..81fc767
--- /dev/null
+++ b/modules/integration/src/test/java/webindex/integration/DevServerIT.java
@@ -0,0 +1,68 @@
+/*
+ * Copyright 2015 Webindex authors (see AUTHORS)
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
+ * in compliance with the License. You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+ * or implied. See the License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package webindex.integration;
+
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+
+import org.apache.commons.io.FileUtils;
+import org.jsoup.Jsoup;
+import org.jsoup.nodes.Document;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import webindex.core.IndexClient;
+import webindex.core.models.Pages;
+
+public class DevServerIT {
+
+  private static final Logger log = LoggerFactory.getLogger(DevServerIT.class);
+
+  static DevServer devServer;
+  static Path tempPath;
+
+  @BeforeClass
+  public static void init() throws Exception {
+    tempPath = Files.createTempDirectory(Paths.get("target/"), "webindex-dev-");
+    devServer = new DevServer(Paths.get("src/test/resources/5-pages.txt"), 24567, null, tempPath);
+    devServer.start();
+  }
+
+  @Test
+  public void basic() throws Exception {
+    Document doc = Jsoup.connect("http://localhost:24567/").get();
+    Assert.assertTrue(doc.text().contains("Enter a domain to view known webpages in that domain"));
+
+    IndexClient client = devServer.getIndexClient();
+    Pages pages = client.getPages("stackoverflow.com", "", 0);
+    Assert.assertEquals(4, pages.getTotal().intValue());
+
+    Pages.PageScore pageScore = pages.getPages().get(0);
+    Assert.assertEquals("http://blog.stackoverflow.com/2009/06/attribution-required/",
+        pageScore.getUrl());
+    Assert.assertEquals(4, pageScore.getScore().intValue());
+  }
+
+  @AfterClass
+  public static void destroy() throws IOException {
+    devServer.stop();
+    FileUtils.deleteDirectory(tempPath.toFile());
+  }
+}
diff --git a/modules/integration/src/test/resources/5-pages.txt b/modules/integration/src/test/resources/5-pages.txt
new file mode 100644
index 0000000..da94d8a
--- /dev/null
+++ b/modules/integration/src/test/resources/5-pages.txt
@@ -0,0 +1,5 @@
+{"url":"http://app.cheezburger.com/Rokas08/TrophyDetails/13f82307-8f12-402e-a544-76db8a2dc19c","pageID":"com.cheezburger\u003e.app\u003eo\u003e/Rokas08/TrophyDetails/13f82307-8f12-402e-a544-76db8a2dc19c","numOutbound":19,"crawlDate":"2015-07-28T03:06:17Z","title":"Rokas08\u0026#39;s Profile - Trophy Details - Cheezburger.com","outboundLinks":[{"url":"https://www.facebook.com/Cheezburger","pageID":"com.facebook\u003e.www\u003es\u003e/Cheezburger","anchorText":"Facebook"},{"url":"https://plus.google.com/105247221600709734681","pageID":"com.google\u003e.plus\u003es\u003e/105247221600709734681","anchorText":"Google+"},{"url":"http://knowyourmeme.com/forums","pageID":"com.knowyourmeme\u003e\u003eo\u003e/forums","anchorText":"Forums"},{"url":"http://knowyourmeme.com/memes/popular","pageID":"com.knowyourmeme\u003e\u003eo\u003e/memes/popular","anchorText":"Popular Memes"},{"url":"http://knowyourmeme.com/photos/most-viewed","pageID":"com.knowyourmeme\u003e\u003eo\u003e/photos/most-viewed","anchorText":"All Images"},{"url":"http://knowyourmeme.com/search?q\u003dcategory%3Aevent\u0026amp;sort\u003dnewest","pageID":"com.knowyourmeme\u003e\u003eo\u003e/search?q\u003dcategory%3Aevent\u0026amp;sort\u003dnewest","anchorText":"New Events"},{"url":"http://knowyourmeme.com/search?q\u003dcategory%3Aperson\u0026amp;sort\u003dnewest","pageID":"com.knowyourmeme\u003e\u003eo\u003e/search?q\u003dcategory%3Aperson\u0026amp;sort\u003dnewest","anchorText":"New People"},{"url":"http://knowyourmeme.com/search?q\u003dcategory%3Asite\u0026amp;sort\u003dnewest","pageID":"com.knowyourmeme\u003e\u003eo\u003e/search?q\u003dcategory%3Asite\u0026amp;sort\u003dnewest","anchorText":"New Sites"},{"url":"http://knowyourmeme.com/search?q\u003dcategory%3Asubculture\u0026amp;sort\u003dnewest","pageID":"com.knowyourmeme\u003e\u003eo\u003e/search?q\u003dcategory%3Asubculture\u0026amp;sort\u003dnewest","anchorText":"New Subcultures"},{"url":"http://knowyourmeme.com/search?utf8\u003d%E2%9C%93\u0026amp;context\u003dentries\u0026amp;q\u003dstatus%3Aconfirmed+category%3Ameme","pageID":"com.knowyourmeme\u003e\u003eo\u003e/search?utf8\u003d%E2%9C%93\u0026amp;context\u003dentries\u0026amp;q\u003dstatus%3Aconfirmed+category%3Ameme","anchorText":"All Memes"},{"url":"http://knowyourmeme.com/videos/most-viewed","pageID":"com.knowyourmeme\u003e\u003eo\u003e/videos/most-viewed","anchorText":"All Videos"},{"url":"http://knowyourmeme.com?ref\u003dnavbar","pageID":"com.knowyourmeme\u003e\u003eo\u003e?ref\u003dnavbar","anchorText":"KYM Wiki"},{"url":"https://twitter.com/Cheezburger","pageID":"com.twitter\u003e\u003es\u003e/Cheezburger","anchorText":"Follow"},{"url":"http://chzb.gr/1riG0EZ?ref\u003dfooternav","pageID":"gr.chzb\u003e\u003eo\u003e/1riG0EZ?ref\u003dfooternav","anchorText":"Videos"},{"url":"http://chzb.gr/1riG0EZ?ref\u003dnavbar","pageID":"gr.chzb\u003e\u003eo\u003e/1riG0EZ?ref\u003dnavbar","anchorText":"Videos Find all our FAIL videos here!"},{"url":"http://chzb.gr/1riGhru?ref\u003dfooternav","pageID":"gr.chzb\u003e\u003eo\u003e/1riGhru?ref\u003dfooternav","anchorText":"Videos"},{"url":"http://chzb.gr/1riGhru?ref\u003dnavbar","pageID":"gr.chzb\u003e\u003eo\u003e/1riGhru?ref\u003dnavbar","anchorText":"Videos See all our Geek videos here!"},{"url":"http://chzb.gr/1riGzi6?ref\u003dfooternav","pageID":"gr.chzb\u003e\u003eo\u003e/1riGzi6?ref\u003dfooternav","anchorText":"Videos"},{"url":"http://chzb.gr/1riGzi6?ref\u003dnavbar","pageID":"gr.chzb\u003e\u003eo\u003e/1riGzi6?ref\u003dnavbar","anchorText":"Videos Watch and learn from all of our trolling videos here!"}]}
+{"url":"http://apple.stackexchange.com/help/badges/9/autobiographer?userid\u003d796","pageID":"com.stackexchange\u003e.apple\u003eo\u003e/help/badges/9/autobiographer?userid\u003d796","numOutbound":4,"crawlDate":"2015-07-28T01:32:26Z","server":"cloudflare-nginx","title":"Autobiographer - Badge - Ask Different","outboundLinks":[{"url":"http://apple.blogoverflow.com/","pageID":"com.blogoverflow\u003e.apple\u003eo\u003e/","anchorText":"blog"},{"url":"http://apple.blogoverflow.com?blb\u003d1","pageID":"com.blogoverflow\u003e.apple\u003eo\u003e?blb\u003d1","anchorText":"blog"},{"url":"http://blog.stackoverflow.com/2009/06/attribution-required/","pageID":"com.stackoverflow\u003e.blog\u003eo\u003e/2009/06/attribution-required/","anchorText":"attribution required"},{"url":"http://creativecommons.org/licenses/by-sa/3.0/","pageID":"org.creativecommons\u003e\u003eo\u003e/licenses/by-sa/3.0/","anchorText":"cc by-sa 3.0"}]}
+{"url":"http://apple.stackexchange.com/questions/15006/spotlight-sometimes-cant-find-a-file-that-actually-exists","pageID":"com.stackexchange\u003e.apple\u003eo\u003e/questions/15006/spotlight-sometimes-cant-find-a-file-that-actually-exists","numOutbound":6,"crawlDate":"2015-07-28T01:58:50Z","server":"cloudflare-nginx","title":"Spotlight sometimes can\u0026#39;t find a file. (that actually exists) - Ask Different","outboundLinks":[{"url":"http://askubuntu.com/questions/653335/using-sed-how-could-we-cut-a-specific-string-from-a-line-of-text","pageID":"com.askubuntu\u003e\u003eo\u003e/questions/653335/using-sed-how-could-we-cut-a-specific-string-from-a-line-of-text","anchorText":"Using sed, how could we cut a specific string from a line of text?"},{"url":"http://apple.blogoverflow.com/","pageID":"com.blogoverflow\u003e.apple\u003eo\u003e/","anchorText":"blog"},{"url":"http://apple.blogoverflow.com?blb\u003d1","pageID":"com.blogoverflow\u003e.apple\u003eo\u003e?blb\u003d1","anchorText":"blog"},{"url":"http://blog.stackoverflow.com/2009/06/attribution-required/","pageID":"com.stackoverflow\u003e.blog\u003eo\u003e/2009/06/attribution-required/","anchorText":"attribution required"},{"url":"http://stackoverflow.com/questions/31654274/is-it-ever-justified-to-have-an-object-which-has-itself-as-a-field","pageID":"com.stackoverflow\u003e\u003eo\u003e/questions/31654274/is-it-ever-justified-to-have-an-object-which-has-itself-as-a-field","anchorText":"Is it ever justified to have an object which has itself as a field?"},{"url":"http://creativecommons.org/licenses/by-sa/3.0/","pageID":"org.creativecommons\u003e\u003eo\u003e/licenses/by-sa/3.0/","anchorText":"cc by-sa 3.0"}]}
+{"url":"http://apple.stackexchange.com/users/208/john-allers","pageID":"com.stackexchange\u003e.apple\u003eo\u003e/users/208/john-allers","numOutbound":8,"crawlDate":"2015-07-28T01:40:51Z","server":"cloudflare-nginx","title":"User John Allers - Ask Different","outboundLinks":[{"url":"http://apple.blogoverflow.com/","pageID":"com.blogoverflow\u003e.apple\u003eo\u003e/","anchorText":"blog"},{"url":"http://apple.blogoverflow.com?blb\u003d1","pageID":"com.blogoverflow\u003e.apple\u003eo\u003e?blb\u003d1","anchorText":"blog"},{"url":"http://serverfault.com/users/2870/","pageID":"com.serverfault\u003e\u003eo\u003e/users/2870/","anchorText":"Server Fault 111 111 3"},{"url":"http://blog.stackoverflow.com/2009/06/attribution-required/","pageID":"com.stackoverflow\u003e.blog\u003eo\u003e/2009/06/attribution-required/","anchorText":"attribution required"},{"url":"http://stackoverflow.com/users/73986/","pageID":"com.stackoverflow\u003e\u003eo\u003e/users/73986/","anchorText":"Stack Overflow 2.2k 2.2k 11828"},{"url":"http://superuser.com/users/3552/","pageID":"com.superuser\u003e\u003eo\u003e/users/3552/","anchorText":"Super User 231 231 26"},{"url":"http://www.zooplet.com/","pageID":"com.zooplet\u003e.www\u003eo\u003e/","anchorText":"zooplet.com"},{"url":"http://creativecommons.org/licenses/by-sa/3.0/","pageID":"org.creativecommons\u003e\u003eo\u003e/licenses/by-sa/3.0/","anchorText":"cc by-sa 3.0"}]}
+{"url":"http://apple.stackexchange.com/users/3126/mjb?tab\u003dsummary","pageID":"com.stackexchange\u003e.apple\u003eo\u003e/users/3126/mjb?tab\u003dsummary","numOutbound":7,"crawlDate":"2015-07-28T01:53:49Z","server":"cloudflare-nginx","title":"User mjb - Ask Different","outboundLinks":[{"url":"http://apple.blogoverflow.com/","pageID":"com.blogoverflow\u003e.apple\u003eo\u003e/","anchorText":"blog"},{"url":"http://apple.blogoverflow.com?blb\u003d1","pageID":"com.blogoverflow\u003e.apple\u003eo\u003e?blb\u003d1","anchorText":"blog"},{"url":"http://serverfault.com/users/117061/","pageID":"com.serverfault\u003e\u003eo\u003e/users/117061/","anchorText":"Server Fault"},{"url":"http://blog.stackoverflow.com/2009/06/attribution-required/","pageID":"com.stackoverflow\u003e.blog\u003eo\u003e/2009/06/attribution-required/","anchorText":"attribution required"},{"url":"http://stackoverflow.com/users/581665/","pageID":"com.stackoverflow\u003e\u003eo\u003e/users/581665/","anchorText":"Stack Overflow"},{"url":"http://superuser.com/users/63808/","pageID":"com.superuser\u003e\u003eo\u003e/users/63808/","anchorText":"Super User"},{"url":"http://creativecommons.org/licenses/by-sa/3.0/","pageID":"org.creativecommons\u003e\u003eo\u003e/licenses/by-sa/3.0/","anchorText":"cc by-sa 3.0"}]}
diff --git a/modules/integration/src/test/resources/log4j.properties b/modules/integration/src/test/resources/log4j.properties
new file mode 100644
index 0000000..c18c21a
--- /dev/null
+++ b/modules/integration/src/test/resources/log4j.properties
@@ -0,0 +1,31 @@
+# Copyright 2014 Webindex authors (see AUTHORS)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+log4j.rootLogger=INFO, CA
+log4j.appender.CA=org.apache.log4j.ConsoleAppender
+log4j.appender.CA.layout=org.apache.log4j.PatternLayout
+log4j.appender.CA.layout.ConversionPattern=%d{ISO8601} [%c] %-5p: %m%n
+
+log4j.logger.org.apache.accumulo=WARN
+log4j.logger.org.apache.curator=ERROR
+log4j.logger.org.apache.fluo=WARN
+log4j.logger.org.apache.hadoop=WARN
+log4j.logger.org.apache.hadoop.mapreduce=ERROR
+log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
+log4j.logger.org.apache.spark=WARN
+log4j.logger.org.apache.zookeeper=ERROR
+log4j.logger.org.eclipse.jetty=WARN
+log4j.logger.org.spark-project=WARN
+log4j.logger.webindex=WARN
+log4j.logger.spark=WARN
diff --git a/modules/ui/pom.xml b/modules/ui/pom.xml
index 167871c..bf9e1e3 100644
--- a/modules/ui/pom.xml
+++ b/modules/ui/pom.xml
@@ -26,126 +26,60 @@
   <name>WebIndex UI</name>
   <dependencies>
     <dependency>
-      <groupId>io.dropwizard</groupId>
-      <artifactId>dropwizard-assets</artifactId>
+      <groupId>com.sparkjava</groupId>
+      <artifactId>spark-core</artifactId>
     </dependency>
     <dependency>
-      <groupId>io.dropwizard</groupId>
-      <artifactId>dropwizard-core</artifactId>
-    </dependency>
-    <dependency>
-      <groupId>io.dropwizard</groupId>
-      <artifactId>dropwizard-views-freemarker</artifactId>
+      <groupId>com.sparkjava</groupId>
+      <artifactId>spark-template-freemarker</artifactId>
     </dependency>
     <dependency>
       <groupId>io.github.astralway</groupId>
       <artifactId>webindex-core</artifactId>
-      <exclusions>
-        <exclusion>
-          <groupId>com.google.guava</groupId>
-          <artifactId>guava</artifactId>
-        </exclusion>
-      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.accumulo</groupId>
       <artifactId>accumulo-core</artifactId>
-      <exclusions>
-        <exclusion>
-          <groupId>log4j</groupId>
-          <artifactId>log4j</artifactId>
-        </exclusion>
-        <exclusion>
-          <groupId>com.google.guava</groupId>
-          <artifactId>guava</artifactId>
-        </exclusion>
-        <exclusion>
-          <groupId>org.slf4j</groupId>
-          <artifactId>slf4j-log4j12</artifactId>
-        </exclusion>
-        <exclusion>
-          <groupId>com.sun.jersey</groupId>
-          <artifactId>*</artifactId>
-        </exclusion>
-      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.fluo</groupId>
       <artifactId>fluo-api</artifactId>
-      <exclusions>
-        <exclusion>
-          <groupId>com.google.guava</groupId>
-          <artifactId>guava</artifactId>
-        </exclusion>
-      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.fluo</groupId>
       <artifactId>fluo-core</artifactId>
-      <exclusions>
-        <exclusion>
-          <groupId>log4j</groupId>
-          <artifactId>log4j</artifactId>
-        </exclusion>
-        <exclusion>
-          <groupId>org.slf4j</groupId>
-          <artifactId>slf4j-log4j12</artifactId>
-        </exclusion>
-        <exclusion>
-          <groupId>javax.xml.bind</groupId>
-          <artifactId>jaxb-api</artifactId>
-        </exclusion>
-        <exclusion>
-          <groupId>com.google.guava</groupId>
-          <artifactId>guava</artifactId>
-        </exclusion>
-        <exclusion>
-          <groupId>com.sun.jersey</groupId>
-          <artifactId>*</artifactId>
-        </exclusion>
-      </exclusions>
     </dependency>
     <dependency>
-      <groupId>junit</groupId>
-      <artifactId>junit</artifactId>
-      <scope>test</scope>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-api</artifactId>
+    </dependency>
+    <dependency>
+      <groupId>org.slf4j</groupId>
+      <artifactId>slf4j-log4j12</artifactId>
     </dependency>
   </dependencies>
-  <build>
-    <plugins>
-      <plugin>
-        <groupId>org.apache.maven.plugins</groupId>
-        <artifactId>maven-shade-plugin</artifactId>
-        <configuration>
-          <createDependencyReducedPom>true</createDependencyReducedPom>
-          <filters>
-            <filter>
-              <artifact>*:*</artifact>
-              <excludes>
-                <exclude>META-INF/*.SF</exclude>
-                <exclude>META-INF/*.DSA</exclude>
-                <exclude>META-INF/*.RSA</exclude>
-              </excludes>
-            </filter>
-          </filters>
-        </configuration>
-        <executions>
-          <execution>
-            <goals>
-              <goal>shade</goal>
-            </goals>
-            <phase>package</phase>
-            <configuration>
-              <transformers>
-                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
-                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
-                  <mainClass>webindex.ui.WebIndexApp</mainClass>
-                </transformer>
-              </transformers>
-            </configuration>
-          </execution>
-        </executions>
-      </plugin>
-    </plugins>
-  </build>
+  <profiles>
+    <profile>
+      <id>webindex-web-server</id>
+      <build>
+        <plugins>
+          <plugin>
+            <groupId>org.codehaus.mojo</groupId>
+            <artifactId>exec-maven-plugin</artifactId>
+            <executions>
+              <execution>
+                <goals>
+                  <goal>java</goal>
+                </goals>
+                <phase>compile</phase>
+                <configuration>
+                  <mainClass>webindex.ui.WebServer</mainClass>
+                </configuration>
+              </execution>
+            </executions>
+          </plugin>
+        </plugins>
+      </build>
+    </profile>
+  </profiles>
 </project>
diff --git a/modules/ui/src/main/java/webindex/ui/FluoHealthCheck.java b/modules/ui/src/main/java/webindex/ui/FluoHealthCheck.java
deleted file mode 100644
index 2b63a44..0000000
--- a/modules/ui/src/main/java/webindex/ui/FluoHealthCheck.java
+++ /dev/null
@@ -1,25 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui;
-
-import com.codahale.metrics.health.HealthCheck;
-
-public class FluoHealthCheck extends HealthCheck {
-
-  @Override
-  protected Result check() throws Exception {
-    return Result.healthy();
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/WebIndexApp.java b/modules/ui/src/main/java/webindex/ui/WebIndexApp.java
deleted file mode 100644
index 85d32e4..0000000
--- a/modules/ui/src/main/java/webindex/ui/WebIndexApp.java
+++ /dev/null
@@ -1,58 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui;
-
-import java.io.File;
-
-import io.dropwizard.Application;
-import io.dropwizard.assets.AssetsBundle;
-import io.dropwizard.setup.Bootstrap;
-import io.dropwizard.setup.Environment;
-import io.dropwizard.views.ViewBundle;
-import org.apache.accumulo.core.client.Connector;
-import org.apache.fluo.api.config.FluoConfiguration;
-import org.apache.fluo.core.util.AccumuloUtil;
-import webindex.core.DataConfig;
-
-public class WebIndexApp extends Application<WebIndexConfig> {
-
-  public static void main(String[] args) throws Exception {
-    new WebIndexApp().run(args);
-  }
-
-  @Override
-  public String getName() {
-    return "webindex-app";
-  }
-
-  @Override
-  public void initialize(Bootstrap<WebIndexConfig> bootstrap) {
-    bootstrap.addBundle(new ViewBundle<>());
-    bootstrap.addBundle(new AssetsBundle());
-  }
-
-  @Override
-  public void run(WebIndexConfig config, Environment environment) {
-
-    DataConfig dataConfig = WebIndexConfig.getDataConfig();
-    File fluoConfigFile = new File(dataConfig.getFluoPropsPath());
-    FluoConfiguration fluoConfig = new FluoConfiguration(fluoConfigFile);
-
-    Connector conn = AccumuloUtil.getConnector(fluoConfig);
-    final WebIndexResources resource = new WebIndexResources(conn, dataConfig);
-    environment.healthChecks().register("fluo", new FluoHealthCheck());
-    environment.jersey().register(resource);
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/WebIndexConfig.java b/modules/ui/src/main/java/webindex/ui/WebIndexConfig.java
deleted file mode 100644
index 1328530..0000000
--- a/modules/ui/src/main/java/webindex/ui/WebIndexConfig.java
+++ /dev/null
@@ -1,25 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui;
-
-import io.dropwizard.Configuration;
-import webindex.core.DataConfig;
-
-public class WebIndexConfig extends Configuration {
-
-  public static DataConfig getDataConfig() {
-    return DataConfig.load();
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/WebIndexResources.java b/modules/ui/src/main/java/webindex/ui/WebIndexResources.java
deleted file mode 100644
index 879d883..0000000
--- a/modules/ui/src/main/java/webindex/ui/WebIndexResources.java
+++ /dev/null
@@ -1,297 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui;
-
-import java.util.Iterator;
-import java.util.Map;
-
-import javax.validation.constraints.NotNull;
-import javax.ws.rs.DefaultValue;
-import javax.ws.rs.GET;
-import javax.ws.rs.Path;
-import javax.ws.rs.Produces;
-import javax.ws.rs.QueryParam;
-import javax.ws.rs.core.MediaType;
-
-import com.google.gson.Gson;
-import org.apache.accumulo.core.client.Connector;
-import org.apache.accumulo.core.client.Scanner;
-import org.apache.accumulo.core.client.TableNotFoundException;
-import org.apache.accumulo.core.data.Key;
-import org.apache.accumulo.core.data.Range;
-import org.apache.accumulo.core.data.Value;
-import org.apache.accumulo.core.security.Authorizations;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-import webindex.core.Constants;
-import webindex.core.DataConfig;
-import webindex.core.models.DomainStats;
-import webindex.core.models.Link;
-import webindex.core.models.Links;
-import webindex.core.models.Page;
-import webindex.core.models.Pages;
-import webindex.core.models.TopResults;
-import webindex.core.models.URL;
-import webindex.ui.util.Pager;
-import webindex.ui.util.WebUrl;
-import webindex.ui.views.HomeView;
-import webindex.ui.views.LinksView;
-import webindex.ui.views.PageView;
-import webindex.ui.views.PagesView;
-import webindex.ui.views.TopView;
-
-@Path("/")
-public class WebIndexResources {
-
-  private static final Logger log = LoggerFactory.getLogger(WebIndexResources.class);
-  private static final int PAGE_SIZE = 25;
-
-  private DataConfig dataConfig;
-  private Connector conn;
-  private Gson gson = new Gson();
-
-  public WebIndexResources(Connector conn, DataConfig dataConfig) {
-    this.conn = conn;
-    this.dataConfig = dataConfig;
-  }
-
-  private static Long getLongValue(Map.Entry<Key, Value> entry) {
-    return Long.parseLong(entry.getValue().toString());
-  }
-
-  @GET
-  @Produces(MediaType.TEXT_HTML)
-  public HomeView getHome() {
-    return new HomeView();
-  }
-
-  @GET
-  @Path("pages")
-  @Produces({MediaType.TEXT_HTML, MediaType.APPLICATION_JSON})
-  public PagesView getPages(@NotNull @QueryParam("domain") String domain,
-      @DefaultValue("") @QueryParam("next") String next,
-      @DefaultValue("0") @QueryParam("pageNum") Integer pageNum) {
-    DomainStats stats = getDomainStats(domain);
-    Pages pages = new Pages(domain, pageNum);
-    log.info("Setting total to {}", stats.getTotal());
-    pages.setTotal(stats.getTotal());
-    String row = "d:" + URL.reverseHost(domain);
-    String cf = Constants.RANK;
-    try {
-      Scanner scanner = conn.createScanner(dataConfig.accumuloIndexTable, Authorizations.EMPTY);
-      Pager pager = new Pager(scanner, Range.prefix(row + ":"), PAGE_SIZE) {
-        @Override
-        public void foundPageEntry(Map.Entry<Key, Value> entry) {
-
-          String url =
-              URL.fromPageID(entry.getKey().getRowData().toString().split(":", 4)[3]).toString();
-          Long count = Long.parseLong(entry.getValue().toString());
-          pages.addPage(url, count);
-        }
-
-        @Override
-        public void foundNextEntry(Map.Entry<Key, Value> entry) {
-          pages.setNext(entry.getKey().getRowData().toString().split(":", 3)[2]);
-        }
-      };
-      if (next.isEmpty()) {
-        pager.getPage(pageNum);
-      } else {
-        pager.getPage(new Key(row + ":" + next, cf, ""));
-
-      }
-    } catch (TableNotFoundException e) {
-      log.error("Table {} not found", dataConfig.accumuloIndexTable);
-    }
-    return new PagesView(pages);
-  }
-
-  @GET
-  @Path("page")
-  @Produces({MediaType.TEXT_HTML, MediaType.APPLICATION_JSON})
-  public PageView getPageView(@NotNull @QueryParam("url") String url) {
-    return new PageView(getPage(url));
-  }
-
-  private Page getPage(String rawUrl) {
-    Page page = null;
-    Long incount = (long) 0;
-    URL url;
-    try {
-      url = WebUrl.from(rawUrl);
-    } catch (Exception e) {
-      log.error("Failed to parse URL {}", rawUrl);
-      return null;
-    }
-
-    try {
-      Scanner scanner = conn.createScanner(dataConfig.accumuloIndexTable, Authorizations.EMPTY);
-      scanner.setRange(Range.exact("p:" + url.toPageID(), Constants.PAGE));
-      Iterator<Map.Entry<Key, Value>> iterator = scanner.iterator();
-      while (iterator.hasNext()) {
-        Map.Entry<Key, Value> entry = iterator.next();
-        switch (entry.getKey().getColumnQualifier().toString()) {
-          case Constants.INCOUNT:
-            incount = getLongValue(entry);
-            break;
-          case Constants.CUR:
-            page = gson.fromJson(entry.getValue().toString(), Page.class);
-            break;
-          default:
-            log.error("Unknown page stat {}", entry.getKey().getColumnQualifier());
-        }
-      }
-    } catch (TableNotFoundException e) {
-      e.printStackTrace();
-    }
-
-    if (page == null) {
-      page = new Page(url.toPageID());
-    }
-    page.setNumInbound(incount);
-    return page;
-  }
-
-  private DomainStats getDomainStats(String domain) {
-    DomainStats stats = new DomainStats(domain);
-    Scanner scanner;
-    try {
-      scanner = conn.createScanner(dataConfig.accumuloIndexTable, Authorizations.EMPTY);
-      scanner.setRange(Range.exact("d:" + URL.reverseHost(domain), Constants.DOMAIN));
-      Iterator<Map.Entry<Key, Value>> iterator = scanner.iterator();
-      while (iterator.hasNext()) {
-        Map.Entry<Key, Value> entry = iterator.next();
-        switch (entry.getKey().getColumnQualifier().toString()) {
-          case Constants.PAGECOUNT:
-            stats.setTotal(getLongValue(entry));
-            break;
-          default:
-            log.error("Unknown page domain {}", entry.getKey().getColumnQualifier());
-        }
-      }
-    } catch (TableNotFoundException e) {
-      e.printStackTrace();
-    }
-    return stats;
-  }
-
-  @GET
-  @Path("links")
-  @Produces({MediaType.TEXT_HTML, MediaType.APPLICATION_JSON})
-  public LinksView getLinks(@NotNull @QueryParam("url") String rawUrl,
-      @NotNull @QueryParam("linkType") String linkType,
-      @DefaultValue("") @QueryParam("next") String next,
-      @DefaultValue("0") @QueryParam("pageNum") Integer pageNum) {
-
-    Links links = new Links(rawUrl, linkType, pageNum);
-
-    URL url;
-    try {
-      url = WebUrl.from(rawUrl);
-    } catch (Exception e) {
-      log.error("Failed to parse URL: " + rawUrl);
-      return new LinksView(links);
-    }
-
-    try {
-      Scanner scanner = conn.createScanner(dataConfig.accumuloIndexTable, Authorizations.EMPTY);
-      String row = "p:" + url.toPageID();
-      if (linkType.equals("in")) {
-        Page page = getPage(rawUrl);
-        String cf = Constants.INLINKS;
-        links.setTotal(page.getNumInbound());
-        Pager pager = new Pager(scanner, Range.exact(row, cf), PAGE_SIZE) {
-
-          @Override
-          public void foundPageEntry(Map.Entry<Key, Value> entry) {
-            String pageID = entry.getKey().getColumnQualifier().toString();
-            String anchorText = entry.getValue().toString();
-            links.addLink(Link.of(pageID, anchorText));
-          }
-
-          @Override
-          public void foundNextEntry(Map.Entry<Key, Value> entry) {
-            links.setNext(entry.getKey().getColumnQualifier().toString());
-          }
-        };
-        if (next.isEmpty()) {
-          pager.getPage(pageNum);
-        } else {
-          pager.getPage(new Key(row, cf, next));
-        }
-      } else {
-        scanner.setRange(Range.exact(row, Constants.PAGE, Constants.CUR));
-        Iterator<Map.Entry<Key, Value>> iter = scanner.iterator();
-        if (iter.hasNext()) {
-          Page curPage = gson.fromJson(iter.next().getValue().toString(), Page.class);
-          links.setTotal(curPage.getNumOutbound());
-          int skip = 0;
-          int add = 0;
-          for (Link l : curPage.getOutboundLinks()) {
-            if (skip < (pageNum * PAGE_SIZE)) {
-              skip++;
-            } else if (add < PAGE_SIZE) {
-              links.addLink(l);
-              add++;
-            } else {
-              links.setNext(l.getPageID());
-              break;
-            }
-          }
-        }
-      }
-    } catch (TableNotFoundException e) {
-      log.error("Table {} not found", dataConfig.accumuloIndexTable);
-    }
-    return new LinksView(links);
-  }
-
-  @GET
-  @Path("top")
-  @Produces({MediaType.TEXT_HTML, MediaType.APPLICATION_JSON})
-  public TopView getTop(@DefaultValue("") @QueryParam("next") String next,
-      @DefaultValue("0") @QueryParam("pageNum") Integer pageNum) {
-
-    TopResults results = new TopResults();
-
-    results.setPageNum(pageNum);
-    try {
-      Scanner scanner = conn.createScanner(dataConfig.accumuloIndexTable, Authorizations.EMPTY);
-      Pager pager = new Pager(scanner, Range.prefix("t:"), PAGE_SIZE) {
-
-        @Override
-        public void foundPageEntry(Map.Entry<Key, Value> entry) {
-          String row = entry.getKey().getRow().toString();
-          String url = URL.fromPageID(row.split(":", 3)[2]).toString();
-          Long num = Long.parseLong(entry.getValue().toString());
-          results.addResult(url, num);
-        }
-
-        @Override
-        public void foundNextEntry(Map.Entry<Key, Value> entry) {
-          results.setNext(entry.getKey().getRow().toString());
-        }
-      };
-      if (next.isEmpty()) {
-        pager.getPage(pageNum);
-      } else {
-        pager.getPage(new Key(next));
-      }
-    } catch (TableNotFoundException e) {
-      log.error("Table {} not found", dataConfig.accumuloIndexTable);
-    }
-    return new TopView(results);
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/WebServer.java b/modules/ui/src/main/java/webindex/ui/WebServer.java
new file mode 100644
index 0000000..3737514
--- /dev/null
+++ b/modules/ui/src/main/java/webindex/ui/WebServer.java
@@ -0,0 +1,131 @@
+/*
+ * Copyright 2015 Webindex authors (see AUTHORS)
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
+ * in compliance with the License. You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+ * or implied. See the License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package webindex.ui;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Collections;
+import java.util.Optional;
+
+import freemarker.template.Configuration;
+import org.apache.accumulo.core.client.Connector;
+import org.apache.fluo.api.config.FluoConfiguration;
+import org.apache.fluo.core.util.AccumuloUtil;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import spark.ModelAndView;
+import spark.Spark;
+import spark.template.freemarker.FreeMarkerEngine;
+import webindex.core.IndexClient;
+import webindex.core.WebIndexConfig;
+import webindex.core.models.Links;
+import webindex.core.models.Page;
+import webindex.core.models.Pages;
+import webindex.core.models.TopResults;
+
+import static spark.Spark.get;
+import static spark.Spark.staticFiles;
+
+public class WebServer {
+
+  private static final Logger log = LoggerFactory.getLogger(WebServer.class);
+
+  private static final ModelAndView VIEW_404 = new ModelAndView(null, "404.ftl");
+
+  public WebServer() {}
+
+  public void start(IndexClient client, int port, Path templatePath) {
+
+    Spark.port(port);
+
+    staticFiles.location("/assets");
+
+    FreeMarkerEngine freeMarkerEngine = new FreeMarkerEngine();
+
+    if (templatePath != null && Files.exists(templatePath)) {
+      log.info("Serving freemarker templates from {}", templatePath.toAbsolutePath());
+      Configuration freeMarkerConfig = new Configuration();
+      try {
+        freeMarkerConfig.setDirectoryForTemplateLoading(templatePath.toFile());
+      } catch (IOException e) {
+        throw new IllegalStateException(e);
+      }
+      freeMarkerEngine.setConfiguration(freeMarkerConfig);
+    }
+
+    get("/", (req, res) -> new ModelAndView(null, "home.ftl"), freeMarkerEngine);
+
+    get("/top",
+        (req, res) -> {
+          String next = Optional.ofNullable(req.queryParams("next")).orElse("");
+          Integer pageNum =
+              Integer.parseInt(Optional.ofNullable(req.queryParams("pageNum")).orElse("0"));
+          TopResults results = client.getTopResults(next, pageNum);
+          return new ModelAndView(Collections.singletonMap("top", results), "top.ftl");
+        }, freeMarkerEngine);
+
+    get("/page", (req, res) -> {
+      if (req.queryParams("url") == null) {
+        return VIEW_404;
+      }
+      String url = req.queryParams("url");
+      Page page = client.getPage(url);
+      return new ModelAndView(Collections.singletonMap("page", page), "page.ftl");
+    }, freeMarkerEngine);
+
+    get("/pages",
+        (req, res) -> {
+          if (req.queryParams("domain") == null) {
+            return VIEW_404;
+          }
+          String domain = req.queryParams("domain");
+          String next = Optional.ofNullable(req.queryParams("next")).orElse("");
+          Integer pageNum =
+              Integer.parseInt(Optional.ofNullable(req.queryParams("pageNum")).orElse("0"));
+          Pages pages = client.getPages(domain, next, pageNum);
+          return new ModelAndView(Collections.singletonMap("pages", pages), "pages.ftl");
+        }, freeMarkerEngine);
+
+    get("/links",
+        (req, res) -> {
+          if (req.queryParams("url") == null || req.queryParams("linkType") == null) {
+            return VIEW_404;
+          }
+          String rawUrl = req.queryParams("url");
+          String linkType = req.queryParams("linkType");
+          String next = Optional.ofNullable(req.queryParams("next")).orElse("");
+          Integer pageNum =
+              Integer.parseInt(Optional.ofNullable(req.queryParams("pageNum")).orElse("0"));
+          Links links = client.getLinks(rawUrl, linkType, next, pageNum);
+          return new ModelAndView(Collections.singletonMap("links", links), "links.ftl");
+        }, freeMarkerEngine);
+  }
+
+  public void stop() {
+    Spark.stop();
+  }
+
+  public static void main(String[] args) throws Exception {
+    WebIndexConfig webIndexConfig = WebIndexConfig.load();
+    File fluoConfigFile = new File(webIndexConfig.getFluoPropsPath());
+    FluoConfiguration fluoConfig = new FluoConfiguration(fluoConfigFile);
+    Connector conn = AccumuloUtil.getConnector(fluoConfig);
+    IndexClient client = new IndexClient(webIndexConfig.accumuloIndexTable, conn);
+    WebServer webServer = new WebServer();
+    webServer.start(client, 4567, null);
+  }
+}
diff --git a/modules/ui/src/main/java/webindex/ui/util/Pager.java b/modules/ui/src/main/java/webindex/ui/util/Pager.java
deleted file mode 100644
index d69d95e..0000000
--- a/modules/ui/src/main/java/webindex/ui/util/Pager.java
+++ /dev/null
@@ -1,72 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui.util;
-
-import java.util.Iterator;
-import java.util.Map;
-
-import org.apache.accumulo.core.client.Scanner;
-import org.apache.accumulo.core.data.Key;
-import org.apache.accumulo.core.data.Range;
-import org.apache.accumulo.core.data.Value;
-
-public abstract class Pager {
-
-  private Scanner scanner;
-  private int pageSize;
-  private Range pageRange;
-
-  public Pager(Scanner scanner, Range pageRange, int pageSize) {
-    this.scanner = scanner;
-    this.pageRange = pageRange;
-    this.pageSize = pageSize;
-  }
-
-  public void getPage(Key nextKey) {
-    scanner.setRange(new Range(nextKey, pageRange.getEndKey()));
-    foundStart(scanner.iterator());
-  }
-
-  public void getPage(int pageNum) {
-    scanner.setRange(pageRange);
-    Iterator<Map.Entry<Key, Value>> iterator = scanner.iterator();
-    if (pageNum > 0) {
-      long skip = 0;
-      while (skip < (pageNum * pageSize)) {
-        iterator.next();
-        skip++;
-      }
-    }
-    foundStart(iterator);
-  }
-
-  private void foundStart(Iterator<Map.Entry<Key, Value>> iterator) {
-    long num = 0;
-    while (iterator.hasNext() && (num < (pageSize + 1))) {
-      Map.Entry<Key, Value> entry = iterator.next();
-      if (num == pageSize) {
-        foundNextEntry(entry);
-      } else {
-        foundPageEntry(entry);
-      }
-      num++;
-    }
-  }
-
-  public abstract void foundPageEntry(Map.Entry<Key, Value> entry);
-
-  public abstract void foundNextEntry(Map.Entry<Key, Value> entry);
-
-}
diff --git a/modules/ui/src/main/java/webindex/ui/util/WebUrl.java b/modules/ui/src/main/java/webindex/ui/util/WebUrl.java
deleted file mode 100644
index 5f2e863..0000000
--- a/modules/ui/src/main/java/webindex/ui/util/WebUrl.java
+++ /dev/null
@@ -1,35 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui.util;
-
-import com.google.common.net.HostSpecifier;
-import com.google.common.net.InternetDomainName;
-import webindex.core.models.URL;
-
-public class WebUrl {
-
-  public static String domainFromHost(String host) {
-    return InternetDomainName.from(host).topPrivateDomain().toString();
-  }
-
-  public static boolean isValidHost(String host) {
-    return HostSpecifier.isValid(host) && InternetDomainName.isValid(host)
-        && InternetDomainName.from(host).isUnderPublicSuffix();
-  }
-
-  public static URL from(String rawUrl) {
-    return URL.from(rawUrl, WebUrl::domainFromHost, WebUrl::isValidHost);
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/views/HomeView.java b/modules/ui/src/main/java/webindex/ui/views/HomeView.java
deleted file mode 100644
index dc2f1d4..0000000
--- a/modules/ui/src/main/java/webindex/ui/views/HomeView.java
+++ /dev/null
@@ -1,24 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui.views;
-
-import io.dropwizard.views.View;
-
-public class HomeView extends View {
-
-  public HomeView() {
-    super("home.ftl");
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/views/LinksView.java b/modules/ui/src/main/java/webindex/ui/views/LinksView.java
deleted file mode 100644
index b168a32..0000000
--- a/modules/ui/src/main/java/webindex/ui/views/LinksView.java
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui.views;
-
-import io.dropwizard.views.View;
-import webindex.core.models.Links;
-
-public class LinksView extends View {
-
-  private final Links links;
-
-  public LinksView(Links links) {
-    super("links.ftl");
-    this.links = links;
-  }
-
-  public Links getLinks() {
-    return links;
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/views/PageView.java b/modules/ui/src/main/java/webindex/ui/views/PageView.java
deleted file mode 100644
index 7b1a9fe..0000000
--- a/modules/ui/src/main/java/webindex/ui/views/PageView.java
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui.views;
-
-import io.dropwizard.views.View;
-import webindex.core.models.Page;
-
-public class PageView extends View {
-
-  private final Page page;
-
-  public PageView(Page page) {
-    super("page.ftl");
-    this.page = page;
-  }
-
-  public Page getPage() {
-    return page;
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/views/PagesView.java b/modules/ui/src/main/java/webindex/ui/views/PagesView.java
deleted file mode 100644
index a856c63..0000000
--- a/modules/ui/src/main/java/webindex/ui/views/PagesView.java
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui.views;
-
-import io.dropwizard.views.View;
-import webindex.core.models.Pages;
-
-public class PagesView extends View {
-
-  private final Pages pages;
-
-  public PagesView(Pages pages) {
-    super("pages.ftl");
-    this.pages = pages;
-  }
-
-  public Pages getPages() {
-    return pages;
-  }
-}
diff --git a/modules/ui/src/main/java/webindex/ui/views/TopView.java b/modules/ui/src/main/java/webindex/ui/views/TopView.java
deleted file mode 100644
index b1927bf..0000000
--- a/modules/ui/src/main/java/webindex/ui/views/TopView.java
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Copyright 2015 Webindex authors (see AUTHORS)
- * 
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
- * in compliance with the License. You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software distributed under the License
- * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
- * or implied. See the License for the specific language governing permissions and limitations under
- * the License.
- */
-
-package webindex.ui.views;
-
-import io.dropwizard.views.View;
-import webindex.core.models.TopResults;
-
-public class TopView extends View {
-
-  private TopResults top;
-
-  public TopView(TopResults results) {
-    super("top.ftl");
-    this.top = results;
-  }
-
-  public TopResults getTop() {
-    return top;
-  }
-}
diff --git a/modules/ui/src/main/resources/spark/template/freemarker/404.ftl b/modules/ui/src/main/resources/spark/template/freemarker/404.ftl
new file mode 100644
index 0000000..24753a1
--- /dev/null
+++ b/modules/ui/src/main/resources/spark/template/freemarker/404.ftl
@@ -0,0 +1,10 @@
+<html>
+<#include "common/head.ftl">
+<body>
+<div class="container" style="margin-top: 20px">
+<div class="row">
+  <div class="col-md-6 col-md-offset-3" style="margin-top: 200px">
+    <h2>404: Page not found</h2>
+  </div>
+</div>
+<#include "common/footer.ftl">
diff --git a/modules/ui/src/main/resources/webindex/ui/views/common/footer.ftl b/modules/ui/src/main/resources/spark/template/freemarker/common/footer.ftl
similarity index 100%
rename from modules/ui/src/main/resources/webindex/ui/views/common/footer.ftl
rename to modules/ui/src/main/resources/spark/template/freemarker/common/footer.ftl
diff --git a/modules/ui/src/main/resources/webindex/ui/views/common/head.ftl b/modules/ui/src/main/resources/spark/template/freemarker/common/head.ftl
similarity index 100%
rename from modules/ui/src/main/resources/webindex/ui/views/common/head.ftl
rename to modules/ui/src/main/resources/spark/template/freemarker/common/head.ftl
diff --git a/modules/ui/src/main/resources/webindex/ui/views/common/header.ftl b/modules/ui/src/main/resources/spark/template/freemarker/common/header.ftl
similarity index 68%
rename from modules/ui/src/main/resources/webindex/ui/views/common/header.ftl
rename to modules/ui/src/main/resources/spark/template/freemarker/common/header.ftl
index 0a9ac86..0163231 100644
--- a/modules/ui/src/main/resources/webindex/ui/views/common/header.ftl
+++ b/modules/ui/src/main/resources/spark/template/freemarker/common/header.ftl
@@ -5,6 +5,6 @@
 <div class="container" style="margin-top: 20px">
 <div class="row" style="margin-bottom: 10px">
   <div class="col-md-6">
-    <a href="/"><img src="/assets/img/webindex.png" alt="WebIndex Home" style="height:30px;"></a>
+    <a href="/"><img src="/img/webindex.png" alt="WebIndex Home" style="height:30px;"></a>
   </div>
 </div>
diff --git a/modules/ui/src/main/resources/webindex/ui/views/home.ftl b/modules/ui/src/main/resources/spark/template/freemarker/home.ftl
similarity index 93%
rename from modules/ui/src/main/resources/webindex/ui/views/home.ftl
rename to modules/ui/src/main/resources/spark/template/freemarker/home.ftl
index 76c7111..c3d9f20 100644
--- a/modules/ui/src/main/resources/webindex/ui/views/home.ftl
+++ b/modules/ui/src/main/resources/spark/template/freemarker/home.ftl
@@ -4,7 +4,7 @@
 <div class="container" style="margin-top: 20px">
 <div class="row">
   <div class="col-md-6 col-md-offset-3" style="margin-top: 200px">
-  <img src="/assets/img/webindex.png" alt="WebIndex">
+  <img src="/img/webindex.png" alt="WebIndex">
   <div style="margin-top: 25px;">
     <h4>Enter a domain to view known webpages in that domain:</h4>
   </div>
diff --git a/modules/ui/src/main/resources/webindex/ui/views/links.ftl b/modules/ui/src/main/resources/spark/template/freemarker/links.ftl
similarity index 100%
rename from modules/ui/src/main/resources/webindex/ui/views/links.ftl
rename to modules/ui/src/main/resources/spark/template/freemarker/links.ftl
diff --git a/modules/ui/src/main/resources/webindex/ui/views/page.ftl b/modules/ui/src/main/resources/spark/template/freemarker/page.ftl
similarity index 100%
rename from modules/ui/src/main/resources/webindex/ui/views/page.ftl
rename to modules/ui/src/main/resources/spark/template/freemarker/page.ftl
diff --git a/modules/ui/src/main/resources/webindex/ui/views/pages.ftl b/modules/ui/src/main/resources/spark/template/freemarker/pages.ftl
similarity index 100%
rename from modules/ui/src/main/resources/webindex/ui/views/pages.ftl
rename to modules/ui/src/main/resources/spark/template/freemarker/pages.ftl
diff --git a/modules/ui/src/main/resources/webindex/ui/views/top.ftl b/modules/ui/src/main/resources/spark/template/freemarker/top.ftl
similarity index 100%
rename from modules/ui/src/main/resources/webindex/ui/views/top.ftl
rename to modules/ui/src/main/resources/spark/template/freemarker/top.ftl
diff --git a/pom.xml b/pom.xml
index 79ee4e1..e07f153 100644
--- a/pom.xml
+++ b/pom.xml
@@ -32,10 +32,10 @@
     <module>modules/core</module>
     <module>modules/data</module>
     <module>modules/ui</module>
+    <module>modules/integration</module>
   </modules>
   <properties>
     <accumulo.version>1.7.1</accumulo.version>
-    <dropwizard.version>0.8.2</dropwizard.version>
     <fluo-recipes.version>1.0.0-incubating-SNAPSHOT</fluo-recipes.version>
     <fluo.version>1.0.0-incubating-SNAPSHOT</fluo.version>
     <hadoop.version>2.6.3</hadoop.version>
@@ -57,26 +57,31 @@
         <version>2.3.1</version>
       </dependency>
       <dependency>
+        <groupId>com.google.guava</groupId>
+        <artifactId>guava</artifactId>
+        <version>14.0.1</version>
+      </dependency>
+      <dependency>
+        <groupId>com.sparkjava</groupId>
+        <artifactId>spark-core</artifactId>
+        <version>2.5</version>
+      </dependency>
+      <dependency>
+        <groupId>com.sparkjava</groupId>
+        <artifactId>spark-template-freemarker</artifactId>
+        <version>2.3</version>
+      </dependency>
+      <dependency>
+        <groupId>commons-io</groupId>
+        <artifactId>commons-io</artifactId>
+        <version>2.4</version>
+      </dependency>
+      <dependency>
         <groupId>commons-lang</groupId>
         <artifactId>commons-lang</artifactId>
         <version>2.6</version>
       </dependency>
       <dependency>
-        <groupId>io.dropwizard</groupId>
-        <artifactId>dropwizard-assets</artifactId>
-        <version>${dropwizard.version}</version>
-      </dependency>
-      <dependency>
-        <groupId>io.dropwizard</groupId>
-        <artifactId>dropwizard-core</artifactId>
-        <version>${dropwizard.version}</version>
-      </dependency>
-      <dependency>
-        <groupId>io.dropwizard</groupId>
-        <artifactId>dropwizard-views-freemarker</artifactId>
-        <version>${dropwizard.version}</version>
-      </dependency>
-      <dependency>
         <groupId>io.github.astralway</groupId>
         <artifactId>webindex-core</artifactId>
         <version>${project.version}</version>
@@ -183,6 +188,11 @@
         <version>3.4.6</version>
       </dependency>
       <dependency>
+        <groupId>org.jsoup</groupId>
+        <artifactId>jsoup</artifactId>
+        <version>1.9.2</version>
+      </dependency>
+      <dependency>
         <groupId>org.netpreserve.commons</groupId>
         <artifactId>webarchive-commons</artifactId>
         <version>1.1.2</version>
@@ -226,13 +236,14 @@
               <exclude>README.md</exclude>
               <exclude>docs/**.md</exclude>
               <exclude>conf/webindex-tests.txt</exclude>
+              <exclude>src/test/resources/5-pages.txt</exclude>
               <exclude>src/test/resources/*.warc</exclude>
               <exclude>src/test/resources/data/set1/*.txt</exclude>
               <exclude>src/main/resources/splits/*.txt</exclude>
-              <exclude>src/main/resources/webindex/ui/views/*.ftl</exclude>
-              <exclude>src/main/resources/webindex/ui/views/common/*.ftl</exclude>
+              <exclude>src/main/resources/spark/template/freemarker/*.ftl</exclude>
+              <exclude>src/main/resources/spark/template/freemarker/common/*.ftl</exclude>
               <exclude>logs/*</exclude>
-              <exclude>paths/*</exclude>
+              <exclude>data/*</exclude>
               <exclude>dependency-reduced-pom.xml</exclude>
             </excludes>
           </configuration>
@@ -247,6 +258,11 @@
             </systemPropertyVariables>
           </configuration>
         </plugin>
+        <plugin>
+          <groupId>org.codehaus.mojo</groupId>
+          <artifactId>exec-maven-plugin</artifactId>
+          <version>1.5.0</version>
+        </plugin>
       </plugins>
     </pluginManagement>
   </build>