blob: d1bc23598b50db39486b741d2a8c18e7f10dd81f [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- NewPage -->
<html lang="en">
<head>
<!-- Generated by javadoc (1.8.0_292) on Tue Jun 15 06:00:58 GMT 2021 -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>org.apache.hadoop.tools.rumen (Apache Hadoop Main 3.3.1 API)</title>
<meta name="date" content="2021-06-15">
<link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style">
<script type="text/javascript" src="../../../../../script.js"></script>
</head>
<body>
<script type="text/javascript"><!--
try {
if (location.href.indexOf('is-external=true') == -1) {
parent.document.title="org.apache.hadoop.tools.rumen (Apache Hadoop Main 3.3.1 API)";
}
}
catch(err) {
}
//-->
</script>
<noscript>
<div>JavaScript is disabled on your browser.</div>
</noscript>
<!-- ========= START OF TOP NAVBAR ======= -->
<div class="topNav"><a name="navbar.top">
<!-- -->
</a>
<div class="skipNav"><a href="#skip.navbar.top" title="Skip navigation links">Skip navigation links</a></div>
<a name="navbar.top.firstrow">
<!-- -->
</a>
<ul class="navList" title="Navigation">
<li><a href="../../../../../overview-summary.html">Overview</a></li>
<li class="navBarCell1Rev">Package</li>
<li>Class</li>
<li><a href="package-use.html">Use</a></li>
<li><a href="package-tree.html">Tree</a></li>
<li><a href="../../../../../deprecated-list.html">Deprecated</a></li>
<li><a href="../../../../../index-all.html">Index</a></li>
<li><a href="../../../../../help-doc.html">Help</a></li>
</ul>
</div>
<div class="subNav">
<ul class="navList">
<li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev&nbsp;Package</a></li>
<li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next&nbsp;Package</a></li>
</ul>
<ul class="navList">
<li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li>
<li><a href="package-summary.html" target="_top">No&nbsp;Frames</a></li>
</ul>
<ul class="navList" id="allclasses_navbar_top">
<li><a href="../../../../../allclasses-noframe.html">All&nbsp;Classes</a></li>
</ul>
<div>
<script type="text/javascript"><!--
allClassesLink = document.getElementById("allclasses_navbar_top");
if(window==top) {
allClassesLink.style.display = "block";
}
else {
allClassesLink.style.display = "none";
}
//-->
</script>
</div>
<a name="skip.navbar.top">
<!-- -->
</a></div>
<!-- ========= END OF TOP NAVBAR ========= -->
<div class="header">
<h1 title="Package" class="title">Package&nbsp;org.apache.hadoop.tools.rumen</h1>
<div class="docSummary">
<div class="block">Rumen is a data extraction and analysis tool built for
<a href="http://hadoop.apache.org/">Apache Hadoop</a>.</div>
</div>
<p>See:&nbsp;<a href="#package.description">Description</a></p>
</div>
<div class="contentContainer"><a name="package.description">
<!-- -->
</a>
<h2 title="Package org.apache.hadoop.tools.rumen Description">Package org.apache.hadoop.tools.rumen Description</h2>
<div class="block">Rumen is a data extraction and analysis tool built for
<a href="http://hadoop.apache.org/">Apache Hadoop</a>. Rumen mines job history
logs to extract meaningful data and stores it into an easily-parsed format.
The default output format of Rumen is <a href="http://www.json.org">JSON</a>.
Rumen uses the <a href="http://jackson.codehaus.org/">Jackson</a> library to
create JSON objects.
<br><br>
The following classes can be used to programmatically invoke Rumen:
<ol>
<li>
<code>JobConfigurationParser</code><br>
A parser to parse and filter out interesting properties from job
configuration.
<br><br>
<i>Sample code</i>:
<pre>
<code>
// An example to parse and filter out job name
String conf_filename = .. // assume the job configuration filename here
// construct a list of interesting properties
List&lt;String&gt; interestedProperties = new ArrayList&lt;String&gt;();
interestedProperties.add("mapreduce.job.name");
JobConfigurationParser jcp =
new JobConfigurationParser(interestedProperties);
InputStream in = new FileInputStream(conf_filename);
Properties parsedProperties = jcp.parse(in);
</code>
</pre>
Some of the commonly used interesting properties are enumerated in
<code>JobConfPropertyNames</code>. <br><br>
<b>Note:</b>
A single instance of <code>JobConfigurationParser</code>
can be used to parse multiple job configuration files.
</li>
<li>
<code>JobHistoryParser</code> <br>
A parser that parses job history files. It is an interface and actual
implementations are defined as Enum in
<code>JobHistoryParserFactory</code>. Note that
<code>RewindableInputStream</code><br>
is a wrapper class around <a href="https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html?is-external=true" title="class or interface in java.io"><code>InputStream</code></a> to make the input
stream rewindable.
<br>
<i>Sample code</i>:
<pre>
<code>
// An example to parse a current job history file i.e a job history
// file for which the version is known
String filename = .. // assume the job history filename here
InputStream in = new FileInputStream(filename);
HistoryEvent event = null;
JobHistoryParser parser = new CurrentJHParser(in);
event = parser.nextEvent();
// process all the events
while (event != null) {
// ... process all event
event = parser.nextEvent();
}
// close the parser and the underlying stream
parser.close();
</code>
</pre>
<code>JobHistoryParserFactory</code> provides a
<code>JobHistoryParserFactory.getParser(org.apache.hadoop.tools.rumen.RewindableInputStream)</code>
API to get a parser for parsing the job history file. Note that this
API can be used if the job history version is unknown.<br><br>
<i>Sample code</i>:
<pre>
<code>
// An example to parse a job history for which the version is not
// known i.e using JobHistoryParserFactory.getParser()
String filename = .. // assume the job history filename here
InputStream in = new FileInputStream(filename);
RewindableInputStream ris = new RewindableInputStream(in);
// JobHistoryParserFactory will check and return a parser that can
// parse the file
JobHistoryParser parser = JobHistoryParserFactory.getParser(ris);
// now use the parser to parse the events
HistoryEvent event = parser.nextEvent();
while (event != null) {
// ... process the event
event = parser.nextEvent();
}
parser.close();
</code>
</pre>
<b>Note:</b>
Create one instance to parse a job history log and close it after use.
</li>
<li>
<code>TopologyBuilder</code><br>
Builds the cluster topology based on the job history events. Every
job history file consists of events. Each event can be represented using
<code>HistoryEvent</code>.
These events can be passed to <code>TopologyBuilder</code> using
<code>TopologyBuilder.process(org.apache.hadoop.mapreduce.jobhistory.HistoryEvent)</code>.
A cluster topology can be represented using <code>LoggedNetworkTopology</code>.
Once all the job history events are processed, the cluster
topology can be obtained using <code>TopologyBuilder.build()</code>.
<br><br>
<i>Sample code</i>:
<pre>
<code>
// Building topology for a job history file represented using
// 'filename' and the corresponding configuration file represented
// using 'conf_filename'
String filename = .. // assume the job history filename here
String conf_filename = .. // assume the job configuration filename here
InputStream jobConfInputStream = new FileInputStream(filename);
InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
TopologyBuilder tb = new TopologyBuilder();
// construct a list of interesting properties
List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
// add the interesting properties here
interestingProperties.add("mapreduce.job.name");
JobConfigurationParser jcp =
new JobConfigurationParser(interestingProperties);
// parse the configuration file
tb.process(jcp.parse(jobConfInputStream));
// read the job history file and pass it to the
// TopologyBuilder.
JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
HistoryEvent e;
// read and process all the job history events
while ((e = parser.nextEvent()) != null) {
tb.process(e);
}
LoggedNetworkTopology topology = tb.build();
</code>
</pre>
</li>
<li>
<code>JobBuilder</code><br>
Summarizes a job history file.
<code>JobHistoryUtils</code> provides
<code>JobHistoryUtils.extractJobID(String)</code>
API for extracting job id from job history or job configuration files
which can be used for instantiating <code>JobBuilder</code>.
<code>JobBuilder</code> generates a
<code>LoggedJob</code> object via
<code>JobBuilder.build()</code>.
See <code>LoggedJob</code> for more details.
<br><br>
<i>Sample code</i>:
<pre>
<code>
// An example to summarize a current job history file 'filename'
// and the corresponding configuration file 'conf_filename'
String filename = .. // assume the job history filename here
String conf_filename = .. // assume the job configuration filename here
InputStream jobConfInputStream = new FileInputStream(job_filename);
InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
String jobID = TraceBuilder.extractJobID(job_filename);
JobBuilder jb = new JobBuilder(jobID);
// construct a list of interesting properties
List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
// add the interesting properties here
interestingProperties.add("mapreduce.job.name");
JobConfigurationParser jcp =
new JobConfigurationParser(interestingProperties);
// parse the configuration file
jb.process(jcp.parse(jobConfInputStream));
// parse the job history file
JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
try {
HistoryEvent e;
// read and process all the job history events
while ((e = parser.nextEvent()) != null) {
jobBuilder.process(e);
}
} finally {
parser.close();
}
LoggedJob job = jb.build();
</code>
</pre>
<b>Note:</b>
The order of parsing the job configuration file or job history file is
not important. Create one instance to parse the history file and job
configuration.
</li>
<li>
<code>DefaultOutputter</code><br>
Implements <code>Outputter</code> and writes
JSON object in text format to the output file.
<code>DefaultOutputter</code> can be
initialized with the output filename.
<br><br>
<i>Sample code</i>:
<pre>
<code>
// An example to summarize a current job history file represented by
// 'filename' and the configuration filename represented using
// 'conf_filename'. Also output the job summary to 'out.json' along
// with the cluster topology to 'topology.json'.
String filename = .. // assume the job history filename here
String conf_filename = .. // assume the job configuration filename here
Configuration conf = new Configuration();
DefaultOutputter do = new DefaultOutputter();
do.init("out.json", conf);
InputStream jobConfInputStream = new FileInputStream(filename);
InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
// extract the job-id from the filename
String jobID = TraceBuilder.extractJobID(filename);
JobBuilder jb = new JobBuilder(jobID);
TopologyBuilder tb = new TopologyBuilder();
// construct a list of interesting properties
List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
// add the interesting properties here
interestingProperties.add("mapreduce.job.name");
JobConfigurationParser jcp =
new JobConfigurationParser(interestingProperties);
// parse the configuration file
tb.process(jcp.parse(jobConfInputStream));
// read the job history file and pass it to the
// TopologyBuilder.
JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
HistoryEvent e;
while ((e = parser.nextEvent()) != null) {
jb.process(e);
tb.process(e);
}
LoggedJob j = jb.build();
// serialize the job summary in json (text) format
do.output(j);
// close
do.close();
do.init("topology.json", conf);
// get the job summary using TopologyBuilder
LoggedNetworkTopology topology = topologyBuilder.build();
// serialize the cluster topology in json (text) format
do.output(topology);
// close
do.close();
</code>
</pre>
</li>
<li>
<code>JobTraceReader</code><br>
A reader for reading <code>LoggedJob</code> serialized using
<code>DefaultOutputter</code>. <code>LoggedJob</code>
provides various APIs for extracting job details. Following are the most
commonly used ones
<ul>
<li><code>LoggedJob.getMapTasks()</code> : Get the map tasks</li>
<li><code>LoggedJob.getReduceTasks()</code> : Get the reduce tasks</li>
<li><code>LoggedJob.getOtherTasks()</code> : Get the setup/cleanup tasks</li>
<li><code>LoggedJob.getOutcome()</code> : Get the job's outcome</li>
<li><code>LoggedJob.getSubmitTime()</code> : Get the job's submit time</li>
<li><code>LoggedJob.getFinishTime()</code> : Get the job's finish time</li>
</ul>
<br><br>
<i>Sample code</i>:
<pre>
<code>
// An example to read job summary from a trace file 'out.json'.
JobTraceReader reader = new JobTracerReader("out.json");
LoggedJob job = reader.getNext();
while (job != null) {
// .... process job level information
for (LoggedTask task : job.getMapTasks()) {
// process all the map tasks in the job
for (LoggedTaskAttempt attempt : task.getAttempts()) {
// process all the map task attempts in the job
}
}
// get the next job
job = reader.getNext();
}
reader.close();
</code>
</pre>
</li>
<li>
<code>ClusterTopologyReader</code><br>
A reader to read <code>LoggedNetworkTopology</code> serialized using
<code>DefaultOutputter</code>. <code>ClusterTopologyReader</code> can be
initialized using the serialized topology filename.
<code>ClusterTopologyReader.get()</code> can
be used to get the
<code>LoggedNetworkTopology</code>.
<br><br>
<i>Sample code</i>:
<pre>
<code>
// An example to read the cluster topology from a topology output file
// 'topology.json'
ClusterTopologyReader reader = new ClusterTopologyReader("topology.json");
LoggedNetworkTopology topology = reader.get();
for (LoggedNetworkTopology t : topology.getChildren()) {
// process the cluster topology
}
reader.close();
</code>
</pre>
</li>
</ol></div>
</div>
<!-- ======= START OF BOTTOM NAVBAR ====== -->
<div class="bottomNav"><a name="navbar.bottom">
<!-- -->
</a>
<div class="skipNav"><a href="#skip.navbar.bottom" title="Skip navigation links">Skip navigation links</a></div>
<a name="navbar.bottom.firstrow">
<!-- -->
</a>
<ul class="navList" title="Navigation">
<li><a href="../../../../../overview-summary.html">Overview</a></li>
<li class="navBarCell1Rev">Package</li>
<li>Class</li>
<li><a href="package-use.html">Use</a></li>
<li><a href="package-tree.html">Tree</a></li>
<li><a href="../../../../../deprecated-list.html">Deprecated</a></li>
<li><a href="../../../../../index-all.html">Index</a></li>
<li><a href="../../../../../help-doc.html">Help</a></li>
</ul>
</div>
<div class="subNav">
<ul class="navList">
<li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev&nbsp;Package</a></li>
<li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next&nbsp;Package</a></li>
</ul>
<ul class="navList">
<li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li>
<li><a href="package-summary.html" target="_top">No&nbsp;Frames</a></li>
</ul>
<ul class="navList" id="allclasses_navbar_bottom">
<li><a href="../../../../../allclasses-noframe.html">All&nbsp;Classes</a></li>
</ul>
<div>
<script type="text/javascript"><!--
allClassesLink = document.getElementById("allclasses_navbar_bottom");
if(window==top) {
allClassesLink.style.display = "block";
}
else {
allClassesLink.style.display = "none";
}
//-->
</script>
</div>
<a name="skip.navbar.bottom">
<!-- -->
</a></div>
<!-- ======== END OF BOTTOM NAVBAR ======= -->
<p class="legalCopy"><small>Copyright &#169; 2021 <a href="https://www.apache.org">Apache Software Foundation</a>. All rights reserved.</small></p>
</body>
</html>