blob: d1bc23598b50db39486b741d2a8c18e7f10dd81f [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<!-- NewPage -->
<html lang="en">
<!-- Generated by javadoc (1.8.0_292) on Tue Jun 15 06:00:58 GMT 2021 -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title> (Apache Hadoop Main 3.3.1 API)</title>
<meta name="date" content="2021-06-15">
<link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style">
<script type="text/javascript" src="../../../../../script.js"></script>
<script type="text/javascript"><!--
try {
if (location.href.indexOf('is-external=true') == -1) {
parent.document.title=" (Apache Hadoop Main 3.3.1 API)";
catch(err) {
<div>JavaScript is disabled on your browser.</div>
<!-- ========= START OF TOP NAVBAR ======= -->
<div class="topNav"><a name="">
<!-- -->
<div class="skipNav"><a href="" title="Skip navigation links">Skip navigation links</a></div>
<a name="">
<!-- -->
<ul class="navList" title="Navigation">
<li><a href="../../../../../overview-summary.html">Overview</a></li>
<li class="navBarCell1Rev">Package</li>
<li><a href="package-use.html">Use</a></li>
<li><a href="package-tree.html">Tree</a></li>
<li><a href="../../../../../deprecated-list.html">Deprecated</a></li>
<li><a href="../../../../../index-all.html">Index</a></li>
<li><a href="../../../../../help-doc.html">Help</a></li>
<div class="subNav">
<ul class="navList">
<li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev&nbsp;Package</a></li>
<li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next&nbsp;Package</a></li>
<ul class="navList">
<li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li>
<li><a href="package-summary.html" target="_top">No&nbsp;Frames</a></li>
<ul class="navList" id="allclasses_navbar_top">
<li><a href="../../../../../allclasses-noframe.html">All&nbsp;Classes</a></li>
<script type="text/javascript"><!--
allClassesLink = document.getElementById("allclasses_navbar_top");
if(window==top) { = "block";
else { = "none";
<a name="">
<!-- -->
<!-- ========= END OF TOP NAVBAR ========= -->
<div class="header">
<h1 title="Package" class="title">Package&nbsp;</h1>
<div class="docSummary">
<div class="block">Rumen is a data extraction and analysis tool built for
<a href="">Apache Hadoop</a>.</div>
<p>See:&nbsp;<a href="#package.description">Description</a></p>
<div class="contentContainer"><a name="package.description">
<!-- -->
<h2 title="Package Description">Package Description</h2>
<div class="block">Rumen is a data extraction and analysis tool built for
<a href="">Apache Hadoop</a>. Rumen mines job history
logs to extract meaningful data and stores it into an easily-parsed format.
The default output format of Rumen is <a href="">JSON</a>.
Rumen uses the <a href="">Jackson</a> library to
create JSON objects.
The following classes can be used to programmatically invoke Rumen:
A parser to parse and filter out interesting properties from job
<i>Sample code</i>:
// An example to parse and filter out job name
String conf_filename = .. // assume the job configuration filename here
// construct a list of interesting properties
List&lt;String&gt; interestedProperties = new ArrayList&lt;String&gt;();
JobConfigurationParser jcp =
new JobConfigurationParser(interestedProperties);
InputStream in = new FileInputStream(conf_filename);
Properties parsedProperties = jcp.parse(in);
Some of the commonly used interesting properties are enumerated in
<code>JobConfPropertyNames</code>. <br><br>
A single instance of <code>JobConfigurationParser</code>
can be used to parse multiple job configuration files.
<code>JobHistoryParser</code> <br>
A parser that parses job history files. It is an interface and actual
implementations are defined as Enum in
<code>JobHistoryParserFactory</code>. Note that
is a wrapper class around <a href="" title="class or interface in"><code>InputStream</code></a> to make the input
stream rewindable.
<i>Sample code</i>:
// An example to parse a current job history file i.e a job history
// file for which the version is known
String filename = .. // assume the job history filename here
InputStream in = new FileInputStream(filename);
HistoryEvent event = null;
JobHistoryParser parser = new CurrentJHParser(in);
event = parser.nextEvent();
// process all the events
while (event != null) {
// ... process all event
event = parser.nextEvent();
// close the parser and the underlying stream
<code>JobHistoryParserFactory</code> provides a
API to get a parser for parsing the job history file. Note that this
API can be used if the job history version is unknown.<br><br>
<i>Sample code</i>:
// An example to parse a job history for which the version is not
// known i.e using JobHistoryParserFactory.getParser()
String filename = .. // assume the job history filename here
InputStream in = new FileInputStream(filename);
RewindableInputStream ris = new RewindableInputStream(in);
// JobHistoryParserFactory will check and return a parser that can
// parse the file
JobHistoryParser parser = JobHistoryParserFactory.getParser(ris);
// now use the parser to parse the events
HistoryEvent event = parser.nextEvent();
while (event != null) {
// ... process the event
event = parser.nextEvent();
Create one instance to parse a job history log and close it after use.
Builds the cluster topology based on the job history events. Every
job history file consists of events. Each event can be represented using
These events can be passed to <code>TopologyBuilder</code> using
A cluster topology can be represented using <code>LoggedNetworkTopology</code>.
Once all the job history events are processed, the cluster
topology can be obtained using <code></code>.
<i>Sample code</i>:
// Building topology for a job history file represented using
// 'filename' and the corresponding configuration file represented
// using 'conf_filename'
String filename = .. // assume the job history filename here
String conf_filename = .. // assume the job configuration filename here
InputStream jobConfInputStream = new FileInputStream(filename);
InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
TopologyBuilder tb = new TopologyBuilder();
// construct a list of interesting properties
List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
// add the interesting properties here
JobConfigurationParser jcp =
new JobConfigurationParser(interestingProperties);
// parse the configuration file
// read the job history file and pass it to the
// TopologyBuilder.
JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
HistoryEvent e;
// read and process all the job history events
while ((e = parser.nextEvent()) != null) {
LoggedNetworkTopology topology =;
Summarizes a job history file.
<code>JobHistoryUtils</code> provides
API for extracting job id from job history or job configuration files
which can be used for instantiating <code>JobBuilder</code>.
<code>JobBuilder</code> generates a
<code>LoggedJob</code> object via
See <code>LoggedJob</code> for more details.
<i>Sample code</i>:
// An example to summarize a current job history file 'filename'
// and the corresponding configuration file 'conf_filename'
String filename = .. // assume the job history filename here
String conf_filename = .. // assume the job configuration filename here
InputStream jobConfInputStream = new FileInputStream(job_filename);
InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
String jobID = TraceBuilder.extractJobID(job_filename);
JobBuilder jb = new JobBuilder(jobID);
// construct a list of interesting properties
List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
// add the interesting properties here
JobConfigurationParser jcp =
new JobConfigurationParser(interestingProperties);
// parse the configuration file
// parse the job history file
JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
try {
HistoryEvent e;
// read and process all the job history events
while ((e = parser.nextEvent()) != null) {
} finally {
LoggedJob job =;
The order of parsing the job configuration file or job history file is
not important. Create one instance to parse the history file and job
Implements <code>Outputter</code> and writes
JSON object in text format to the output file.
<code>DefaultOutputter</code> can be
initialized with the output filename.
<i>Sample code</i>:
// An example to summarize a current job history file represented by
// 'filename' and the configuration filename represented using
// 'conf_filename'. Also output the job summary to 'out.json' along
// with the cluster topology to 'topology.json'.
String filename = .. // assume the job history filename here
String conf_filename = .. // assume the job configuration filename here
Configuration conf = new Configuration();
DefaultOutputter do = new DefaultOutputter();
do.init("out.json", conf);
InputStream jobConfInputStream = new FileInputStream(filename);
InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
// extract the job-id from the filename
String jobID = TraceBuilder.extractJobID(filename);
JobBuilder jb = new JobBuilder(jobID);
TopologyBuilder tb = new TopologyBuilder();
// construct a list of interesting properties
List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
// add the interesting properties here
JobConfigurationParser jcp =
new JobConfigurationParser(interestingProperties);
// parse the configuration file
// read the job history file and pass it to the
// TopologyBuilder.
JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
HistoryEvent e;
while ((e = parser.nextEvent()) != null) {
LoggedJob j =;
// serialize the job summary in json (text) format
// close
do.init("topology.json", conf);
// get the job summary using TopologyBuilder
LoggedNetworkTopology topology =;
// serialize the cluster topology in json (text) format
// close
A reader for reading <code>LoggedJob</code> serialized using
<code>DefaultOutputter</code>. <code>LoggedJob</code>
provides various APIs for extracting job details. Following are the most
commonly used ones
<li><code>LoggedJob.getMapTasks()</code> : Get the map tasks</li>
<li><code>LoggedJob.getReduceTasks()</code> : Get the reduce tasks</li>
<li><code>LoggedJob.getOtherTasks()</code> : Get the setup/cleanup tasks</li>
<li><code>LoggedJob.getOutcome()</code> : Get the job's outcome</li>
<li><code>LoggedJob.getSubmitTime()</code> : Get the job's submit time</li>
<li><code>LoggedJob.getFinishTime()</code> : Get the job's finish time</li>
<i>Sample code</i>:
// An example to read job summary from a trace file 'out.json'.
JobTraceReader reader = new JobTracerReader("out.json");
LoggedJob job = reader.getNext();
while (job != null) {
// .... process job level information
for (LoggedTask task : job.getMapTasks()) {
// process all the map tasks in the job
for (LoggedTaskAttempt attempt : task.getAttempts()) {
// process all the map task attempts in the job
// get the next job
job = reader.getNext();
A reader to read <code>LoggedNetworkTopology</code> serialized using
<code>DefaultOutputter</code>. <code>ClusterTopologyReader</code> can be
initialized using the serialized topology filename.
<code>ClusterTopologyReader.get()</code> can
be used to get the
<i>Sample code</i>:
// An example to read the cluster topology from a topology output file
// 'topology.json'
ClusterTopologyReader reader = new ClusterTopologyReader("topology.json");
LoggedNetworkTopology topology = reader.get();
for (LoggedNetworkTopology t : topology.getChildren()) {
// process the cluster topology
<!-- ======= START OF BOTTOM NAVBAR ====== -->
<div class="bottomNav"><a name="navbar.bottom">
<!-- -->
<div class="skipNav"><a href="#skip.navbar.bottom" title="Skip navigation links">Skip navigation links</a></div>
<a name="navbar.bottom.firstrow">
<!-- -->
<ul class="navList" title="Navigation">
<li><a href="../../../../../overview-summary.html">Overview</a></li>
<li class="navBarCell1Rev">Package</li>
<li><a href="package-use.html">Use</a></li>
<li><a href="package-tree.html">Tree</a></li>
<li><a href="../../../../../deprecated-list.html">Deprecated</a></li>
<li><a href="../../../../../index-all.html">Index</a></li>
<li><a href="../../../../../help-doc.html">Help</a></li>
<div class="subNav">
<ul class="navList">
<li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev&nbsp;Package</a></li>
<li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next&nbsp;Package</a></li>
<ul class="navList">
<li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li>
<li><a href="package-summary.html" target="_top">No&nbsp;Frames</a></li>
<ul class="navList" id="allclasses_navbar_bottom">
<li><a href="../../../../../allclasses-noframe.html">All&nbsp;Classes</a></li>
<script type="text/javascript"><!--
allClassesLink = document.getElementById("allclasses_navbar_bottom");
if(window==top) { = "block";
else { = "none";
<a name="skip.navbar.bottom">
<!-- -->
<!-- ======== END OF BOTTOM NAVBAR ======= -->
<p class="legalCopy"><small>Copyright &#169; 2021 <a href="">Apache Software Foundation</a>. All rights reserved.</small></p>