A distributed, parallelized (Map Reduce) wrapper around Apache™ RAT to allow it to complete on large code repositories of multiple file types where Apache™ RAT hangs forever.
The tool leverages Apache™ OODT to parllelize ane workflow together the following components:
You can build DRAT in a few steps:
mkdir -p /usr/local/drat/deploy
mkdir -p /usr/local/drat/src
cd /usr/local/drat/
git clone https://github.com/chrismattmann/drat.git
mv drat src && cd src
mvn install
cp -R target/distribution/dms-distribution-0.1-bin.tar.gz deploy
cd deploy
tar xvzf dms-distribution-0.1-bin.tar.gz
rm -rf *.tar.gz
Here are the basic commands to run DRAT. Imagine you had a code repo, your-repo, that lives in $HOME/your-repo
.
Set your $DRAT_HOME
environment variable, e.g., to /usr/local/drat/deploy
Start Apache™ OODTcd $DRAT_HOME
cd filemgr/bin && ./filemgr start
cd ../../workflowbin/ && ./wmgr start
Crawl the repository of interest, e.g., $HOME/your-repo
:cd $DRAT_HOME/crawler/bin
./crawler_launcher --operation --metPC --productPath $HOME/your-repo --metExtractorConfig $DRAT_HOME/extractors/code/default.cpr.conf --metExtractor org.apache.oodt.cas.metadata.extractors.CopyAndRewriteExtractor --filemgrUrl http://localhost:9000 --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
Index the crawled repo in Apache™ SOLR:cd $DRAT_HOME/filemgr/bin
java -Djava.ext.dirs=../lib -DSOLR_INDEXER_CONFIG=../etc/indexer.properties org.apache.oodt.cas.filemgr.tools.SolrIndexer --all --fmUrl http://localhost:9000 --optimize --solrUrl http://localhost:8080/solr/drat
Fire off the partitioner and mapperscd $DRAT_HOME/workflow/bin
./wmgr-client --url http://localhost:9001 --operation --dynWorkflow --taskIds urn:drat:MimePartitioner
Fire off the reducercd $DRAT_HOME/workflow/bin
./wmgr-client --url http://localhost:9001 --operation --dynWorkflow --taskIds urn:drat:RatAggregator