Instructions for installing and running the Accumulo Wikisearch example.
Accumulo, Hadoop, and ZooKeeper must be installed and running
Download one or more wikipedia dump files and put them in an HDFS directory. You will want to grab the files with the link name of pages-articles.xml.bz2. Though not strictly required, the ingest will go more quickly if the files are decompressed:
$ bunzip2 enwiki-*-pages-articles.xml.bz2 $ hadoop fs -put enwiki-*-pages-articles.xml /wikipedia/enwiki-pages-articles.xml
Create a wikipedia.xml
file (or wikipedia_parallel.xml
if running parallel version) from wikipedia.xml.example or wikipedia_parallel.xml.example and modify for your Accumulo installation.
$ cp ingest/conf $ cp wikipedia.xml.example wikipedia.xml $ vim wikipedia.xml
Copy ingest/lib/wikisearch-*.jar
to $ACCUMULO_HOME/lib/ext
Run ingest/bin/ingest.sh
(or ingest_parallel.sh
if running parallel version) with one argument (the name of the directory in HDFS where the wikipedia XML files reside) and this will kick off a MapReduce job to ingest the data into Accumulo.
$JBOSS_HOME/server/default/deployers/jbossws.deployer/META-INF/stack-agnostic-jboss-beans.xml
mvn clean install
at the Wikisearch top level to install the jars into your local maven repo before building the query package.Create a ejb-jar.xml
from ejb-jar.xml.example and modify it to contain the same information that you put into wikipedia.xml
in the ingest steps above:
cd query/src/main/resources/META-INF/ cp ejb-jar.xml.example ejb-jar.xml vim ejb-jar.xml
Re-build the query distribution by running mvn package assembly:single
in the query module's directory.
Untar the resulting file in the $JBOSS_HOME/server/default
directory.
$ cd $JBOSS_HOME/server/default $ tar -xzf /some/path/to/wikisearch/query/target/wikisearch-query*.tar.gz
This will place the dependent jars in the lib directory and the EJB jar into the deploy directory.
Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy.
Start JBoss ($JBOSS_HOME/bin/run.sh)
Use the Accumulo shell and give the user permissions for the wikis that you loaded:
> setauths -u <user> -s all,enwiki,eswiki,frwiki,fawiki
Copy the following jars to the $ACCUMULO_HOME/lib/ext
directory from the $JBOSS_HOME/server/default/lib
directory:
kryo*.jar minlog*.jar commons-jexl*.jar
Copy $JBOSS_HOME/server/default/deploy/wikisearch-query*.jar
to $ACCUMULO_HOME/lib/ext.
At this point you should be able to open a browser and view the page:
http://localhost:8080/accumulo-wikisearch/ui.html
You can issue the queries using this user interface or via the following REST urls:
<host>/accumulo-wikisearch/rest/Query/xml <host>/accumulo-wikisearch/rest/Query/html <host>/accumulo-wikisearch/rest/Query/yaml <host>/accumulo-wikisearch/rest/Query/json.
There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e. enwiki,frwiki,dewiki, etc. Or you can use all)