|tagger||Chris Mattmann <email@example.com>||Sun Jul 13 11:57:29 2014 -0700|
DRAT 0.5 release
|author||Chris Mattmann <firstname.lastname@example.org>||Wed Jul 09 13:56:33 2014 -0700|
|committer||Chris Mattmann <email@example.com>||Wed Jul 09 13:56:33 2014 -0700|
Merge pull request #16 from tpalsulich/master Add a table of contents to readme, generated by DocToc.
Table of Contents generated with DocToc
A distributed, parallelized (Map Reduce) wrapper around Apache™ RAT (Release Audit Tool). RAT is used to check for proper licensing in software projects. However, RAT takes a prohibitively long time to analyze large repositories of code, since it can only run on one JVM. Furthermore, RAT isn't customizable by file type or file size and provides no incremental output. This wrapper dramatically speeds up the process by leveraging Apache™ OODT to parallelize and workflow the following components:
You can build DRAT in a few steps:
mkdir -p /usr/local/drat/deploy
mkdir -p /usr/local/drat/src
git clone https://github.com/chrismattmann/drat.git .
cp -R distribution/target/dms-distribution-0.1-bin.tar.gz ../deploy/
tar xvzf dms-distribution-0.1-bin.tar.gz
Install Vagrant from here.
Install VirtualBox from here.
git clone https://github.com/chrismattmann/drat.git cd drat vagrant up vagrant ssh
Skip to automated method or manual method. Note that the /vagrant directory is a shared folder to your host system and is a great way to interact with codebases you're looking to audit with drat.
Here are the basic commands to run DRAT. Imagine you had a code repo, your-repo, that lives in
$DRAT_HOME environment variable, e.g., to
/usr/local/drat/deploy. Note the tomcat that we ship with DRAT won‘t start correctly unless you define the
$JAVA_HOME environment variable, so make sure that’s set too.
Start Apache™ OODT:
$DRAT_HOME/bin/drat go $HOME/your-repo
If you would rather run the individual commands yourself, use the manual method:
Crawl the repository of interest, e.g.,
$DRAT_HOME/bin/drat crawl $HOME/your-repo
Index the crawled repo in Apache™ SOLR:
$DRAT_HOME/bin/drat index $HOME/your-repo
Fire off the partitioner and mappers
Fire off the reducer
$DRAT_HOME/bin/drat for the specifics of each command. To shut down OODT, run
DRAT UIs are accessible at:
DRAT publishes its analyzed aggregated RAT logs to:
These look like e.g.
cat *.csv Notes,Binaries,Archives,Standards,Apache,Generated,Unknown 0,2,0,530,497,0,33
So, these are the counts of each of the source code files and what licenses they are:
Binaries - it's a binary file, no license Notes - it's a notes file Archives - it's a tar/zip/etc archive, no license Standards - it's one of the OSI approved licenses that isn't ALv2, so e.g., BSD, MIT, LGPL, etc. Generated - these are generated files (either source or binary) Apache - apache licensed files Unknown - non discernible license
If you run DRAT on your source code and want to run it again the easiest way to do so is to:
Grab the aliases for fmquery and fmdel from https://issues.apache.org/jira/browse/OODT-306 and add them to your bash or tcsh profile:
fmquery "ProductType:RatLog" | fmdel
fmquery "ProductType:RatAggregateLog" | fmdel
You should be good to go to re-run the analysis at that point.
##If you want to analyze an entirely new code base
You shouldn't need to run these, but the manual version of
Blow away the following dirs:
rm -rf $DRAT_HOME/data/workflow
rm -rf $DRAT_HOME/filemgr/catalog
rm -rf $DRAT_HOME/solr/drat/data
Blow away files in following dirs:
rm -rf $DRAT_HOME/data/archive/*
The following useful environment variables are set by RADIX but can be overwritten on a per DRAT install basis. Here's the default config, feel free to change/override in your own environment.
setenv DRAT_HOME /usr/local/drat/deploy setenv FILEMGR_URL http://localhost:9000 setenv WORKFLOW_URL http://localhost:9001 setenv RESMGR_URL http://localhost:9002 setenv WORKFLOW_HOME $DRAT_HOME/workflow setenv FILEMGR_HOME $DRAT_HOME/filemgr setenv PGE_ROOT $DRAT_HOME/pge setenv PCS_HOME $DRAT_HOME/pcs setenv GANGLIA_URL http://zipper.jpl.nasa.gov/ganglia/
There is now a Youtube video on DRAT explaining DRAT's motivation, and results of running it on DARPA XDATA and on the Computational Infrastructure for Geodynamics as part of my NSF project. The video was made for the 2014 Summer Earth Science Information Partners Federation Meeting.