commit | af2b477f0c437ab25f6039ae63a3be43aa00a515 | [log] [tgz] |
---|---|---|
author | Tyler Palsulich <tpalsulich@gmail.com> | Thu Jun 26 11:19:57 2014 -0700 |
committer | Tyler Palsulich <tpalsulich@gmail.com> | Thu Jun 26 11:19:57 2014 -0700 |
tree | 2cd3a4a06236941bfb07d9a952dc0a7832f4e5b6 | |
parent | a48f056f9e48efbbc276e146cee372c4909c4dff [diff] |
Update readme with oodt start/stop directions.
A distributed, parallelized (Map Reduce) wrapper around Apache™ RAT to allow it to complete on large code repositories of multiple file types where Apache™ RAT hangs forever.
The tool leverages Apache™ OODT to parallelize and workflow together the following components:
You can build DRAT in a few steps:
mkdir -p /usr/local/drat/deploy
mkdir -p /usr/local/drat/src
cd /usr/local/drat/src
git clone https://github.com/chrismattmann/drat.git .
mvn install
cp -R distribution/target/dms-distribution-0.1-bin.tar.gz ../deploy/
cd ../deploy/
tar xvzf dms-distribution-0.1-bin.tar.gz
rm *.tar.gz
Here are the basic commands to run DRAT. Imagine you had a code repo, your-repo, that lives in $HOME/your-repo
.
Set your $DRAT_HOME
environment variable, e.g., to /usr/local/drat/deploy
Start Apache™ OODT:$DRAT_HOME/bin/oodt start
cd $DRAT_HOME/bin
./drat go $HOME/your-repo
If you would rather run the individual commands yourself, use the manual method:
Crawl the repository of interest, e.g., $HOME/your-repo
:$DRAT_HOME/bin/drat crawl $HOME/your-repo
Index the crawled repo in Apache™ SOLR:$DRAT_HOME/bin/drat index $HOME/your-repo
Fire off the partitioner and mapperscd $DRAT_HOME/bin/drat map
Fire off the reducercd $DRAT_HOME/bin/drat reduce
Please see $DRAT_HOME/bin/drat
for the specifics of each command.
DRAT UIs are accessible at:
http://localhost:8080/opsui/ - main cockpit, Apache OODT OPSUI
http://localhost:8080/solr/ - Solr4 catalog
DRAT publishes its analyzed aggregated RAT logs to:
$DRAT_HOME/data/archive/rataggregate/*.csv
These look like e.g.
cat *.csv Notes,Binaries,Archives,Standards,Apache,Generated,Unknown 0,2,0,530,497,0,33
So, these are the counts of each of the source code files and what licenses they are:
Binaries - it's a binary file, no license Notes - it's a notes file Archives - it's a tar/zip/etc archive, no license Standards - it's one of the OSI approved licenses that isn't ALv2, so e.g., BSD, MIT, LGPL, etc. Generated - these are generated files (either source or binary) Apache - apache licensed files Unknown - non discernible license
If you run DRAT on your source code and want to run it again the easiest way to do so is to:
Grab the aliases for fmquery and fmdel from https://issues.apache.org/jira/browse/OODT-306 and add them to your bash or tcsh profile:
Run fmquery "ProductType:RatLog" | fmdel
Run fmquery "ProductType:RatAggregateLog" | fmdel
You should be good to go to re-run the analysis at that point.
##If you want to analyze an entirely new code base $DRAT_HOME/bin/oodt stop
$DRAT_HOME/bin/drat reset
$DRAT_HOME/bin/oodt start
You shouldn't need to run these, but the manual version of reset
is:
Blow away the following dirs:rm -rf $DRAT_HOME/data/workflow
rm -rf $DRAT_HOME/filemgr/catalog
rm -rf $DRAT_HOME/solr/drat/data
Blow away files in following dirs:rm -rf $DRAT_HOME/data/archive/*
The following useful environment variables are set by RADIX but can be overwritten on a per DRAT install basis. Here's the default config, feel free to change/override in your own environment.
setenv DRAT_HOME /usr/local/drat/deploy setenv FILEMGR_URL http://localhost:9000 setenv WORKFLOW_URL http://localhost:9001 setenv RESMGR_URL http://localhost:9002 setenv WORKFLOW_HOME $DRAT_HOME/workflow setenv FILEMGR_HOME $DRAT_HOME/filemgr setenv PGE_ROOT $DRAT_HOME/pge setenv PCS_HOME $DRAT_HOME/pcs setenv GANGLIA_URL http://zipper.jpl.nasa.gov/ganglia/
Note the tomcat that we ship with DRAT won‘t start correctly unless you define the $JAVA_HOME
environment variable, so make sure that’s set too.