content/docs/r3.3.1/api/org/apache/hadoop/tools/rumen/package-summary.html - hadoop-site - Git at Google

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
 <!-- NewPage -->
 <html lang="en">
 <head>
 <!-- Generated by javadoc (1.8.0_292) on Tue Jun 15 06:00:58 GMT 2021 -->
 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
 <title>org.apache.hadoop.tools.rumen (Apache Hadoop Main 3.3.1 API)</title>
 <meta name="date" content="2021-06-15">
 <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style">
 <script type="text/javascript" src="../../../../../script.js"></script>
 </head>
 <body>
 <script type="text/javascript"><!--
     try {
         if (location.href.indexOf('is-external=true') == -1) {
             parent.document.title="org.apache.hadoop.tools.rumen (Apache Hadoop Main 3.3.1 API)";
         }
     }
     catch(err) {
     }
 //-->
 </script>
 <noscript>
 <div>JavaScript is disabled on your browser.</div>
 </noscript>
 <!-- ========= START OF TOP NAVBAR ======= -->
 <div class="topNav"><a name="navbar.top">
 <!--   -->
 </a>
 <div class="skipNav"><a href="#skip.navbar.top" title="Skip navigation links">Skip navigation links</a></div>
 <a name="navbar.top.firstrow">
 <!--   -->
 </a>
 <ul class="navList" title="Navigation">
 <li><a href="../../../../../overview-summary.html">Overview</a></li>
 <li class="navBarCell1Rev">Package</li>
 <li>Class</li>
 <li><a href="package-use.html">Use</a></li>
 <li><a href="package-tree.html">Tree</a></li>
 <li><a href="../../../../../deprecated-list.html">Deprecated</a></li>
 <li><a href="../../../../../index-all.html">Index</a></li>
 <li><a href="../../../../../help-doc.html">Help</a></li>
 </ul>
 </div>
 <div class="subNav">
 <ul class="navList">
 <li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev&nbsp;Package</a></li>
 <li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next&nbsp;Package</a></li>
 </ul>
 <ul class="navList">
 <li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li>
 <li><a href="package-summary.html" target="_top">No&nbsp;Frames</a></li>
 </ul>
 <ul class="navList" id="allclasses_navbar_top">
 <li><a href="../../../../../allclasses-noframe.html">All&nbsp;Classes</a></li>
 </ul>
 <div>
 <script type="text/javascript"><!--
   allClassesLink = document.getElementById("allclasses_navbar_top");
   if(window==top) {
     allClassesLink.style.display = "block";
   }
   else {
     allClassesLink.style.display = "none";
   }
   //-->
 </script>
 </div>
 <a name="skip.navbar.top">
 <!--   -->
 </a></div>
 <!-- ========= END OF TOP NAVBAR ========= -->
 <div class="header">
 <h1 title="Package" class="title">Package&nbsp;org.apache.hadoop.tools.rumen</h1>
 <div class="docSummary">
 <div class="block">Rumen is a data extraction and analysis tool built for
  <a href="http://hadoop.apache.org/">Apache Hadoop</a>.</div>
 </div>
 <p>See:&nbsp;<a href="#package.description">Description</a></p>
 </div>
 <div class="contentContainer"><a name="package.description">
 <!--   -->
 </a>
 <h2 title="Package org.apache.hadoop.tools.rumen Description">Package org.apache.hadoop.tools.rumen Description</h2>
 <div class="block">Rumen is a data extraction and analysis tool built for
  <a href="http://hadoop.apache.org/">Apache Hadoop</a>. Rumen mines job history
  logs to extract meaningful data and stores it into an easily-parsed format.

  The default output format of Rumen is <a href="http://www.json.org">JSON</a>.
  Rumen uses the <a href="http://jackson.codehaus.org/">Jackson</a> library to
  create JSON objects.
  <br><br>

  The following classes can be used to programmatically invoke Rumen:
  <ol>
    <li>
     <code>JobConfigurationParser</code><br>
       A parser to parse and filter out interesting properties from job
       configuration.

       <br><br>
       <i>Sample code</i>:
       <pre>
       <code>
         // An example to parse and filter out job name

         String conf_filename = .. // assume the job configuration filename here

         // construct a list of interesting properties
         List&lt;String&gt; interestedProperties = new ArrayList&lt;String&gt;();
         interestedProperties.add("mapreduce.job.name");

         JobConfigurationParser jcp =
           new JobConfigurationParser(interestedProperties);

         InputStream in = new FileInputStream(conf_filename);
         Properties parsedProperties = jcp.parse(in);
      </code>
      </pre>
      Some of the commonly used interesting properties are enumerated in
      <code>JobConfPropertyNames</code>. <br><br>

      <b>Note:</b>
         A single instance of <code>JobConfigurationParser</code>
         can be used to parse multiple job configuration files.

    </li>
    <li>
     <code>JobHistoryParser</code> <br>
       A parser that parses job history files. It is an interface and actual
       implementations are defined as Enum in
       <code>JobHistoryParserFactory</code>. Note that
       <code>RewindableInputStream</code><br>
       is a wrapper class around <a href="https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html?is-external=true" title="class or interface in java.io"><code>InputStream</code></a> to make the input
       stream rewindable.

       <br>
       <i>Sample code</i>:
       <pre>
       <code>
         // An example to parse a current job history file i.e a job history
         // file for which the version is known

         String filename = .. // assume the job history filename here

         InputStream in = new FileInputStream(filename);

         HistoryEvent event = null;

         JobHistoryParser parser = new CurrentJHParser(in);

         event = parser.nextEvent();
         // process all the events
         while (event != null) {
           // ... process all event
           event = parser.nextEvent();
         }

         // close the parser and the underlying stream
         parser.close();
       </code>
       </pre>

       <code>JobHistoryParserFactory</code> provides a
       <code>JobHistoryParserFactory.getParser(org.apache.hadoop.tools.rumen.RewindableInputStream)</code>
       API to get a parser for parsing the job history file. Note that this
       API can be used if the job history version is unknown.<br><br>
       <i>Sample code</i>:
       <pre>
       <code>
         // An example to parse a job history for which the version is not
         // known i.e using JobHistoryParserFactory.getParser()

         String filename = .. // assume the job history filename here

         InputStream in = new FileInputStream(filename);
         RewindableInputStream ris = new RewindableInputStream(in);

         // JobHistoryParserFactory will check and return a parser that can
         // parse the file
         JobHistoryParser parser = JobHistoryParserFactory.getParser(ris);

         // now use the parser to parse the events
         HistoryEvent event = parser.nextEvent();
         while (event != null) {
           // ... process the event
           event = parser.nextEvent();
         }

         parser.close();
       </code>
       </pre>
       <b>Note:</b>
         Create one instance to parse a job history log and close it after use.
   </li>
   <li>
     <code>TopologyBuilder</code><br>
       Builds the cluster topology based on the job history events. Every
       job history file consists of events. Each event can be represented using
       <code>HistoryEvent</code>.
       These events can be passed to <code>TopologyBuilder</code> using
       <code>TopologyBuilder.process(org.apache.hadoop.mapreduce.jobhistory.HistoryEvent)</code>.
       A cluster topology can be represented using <code>LoggedNetworkTopology</code>.
       Once all the job history events are processed, the cluster
       topology can be obtained using <code>TopologyBuilder.build()</code>.

       <br><br>
       <i>Sample code</i>:
       <pre>
       <code>
         // Building topology for a job history file represented using
         // 'filename' and the corresponding configuration file represented
         // using 'conf_filename'
         String filename = .. // assume the job history filename here
         String conf_filename = .. // assume the job configuration filename here

         InputStream jobConfInputStream = new FileInputStream(filename);
         InputStream jobHistoryInputStream = new FileInputStream(conf_filename);

         TopologyBuilder tb = new TopologyBuilder();

         // construct a list of interesting properties
         List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
         // add the interesting properties here
         interestingProperties.add("mapreduce.job.name");

         JobConfigurationParser jcp =
           new JobConfigurationParser(interestingProperties);

         // parse the configuration file
         tb.process(jcp.parse(jobConfInputStream));

         // read the job history file and pass it to the
         // TopologyBuilder.
         JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
         HistoryEvent e;

         // read and process all the job history events
         while ((e = parser.nextEvent()) != null) {
           tb.process(e);
         }

         LoggedNetworkTopology topology = tb.build();
       </code>
       </pre>
   </li>
   <li>
     <code>JobBuilder</code><br>
       Summarizes a job history file.
       <code>JobHistoryUtils</code> provides
       <code>JobHistoryUtils.extractJobID(String)</code>
       API for extracting job id from job history or job configuration files
       which can be used for instantiating <code>JobBuilder</code>.
       <code>JobBuilder</code> generates a
       <code>LoggedJob</code> object via
       <code>JobBuilder.build()</code>.
       See <code>LoggedJob</code> for more details.

       <br><br>
       <i>Sample code</i>:
       <pre>
       <code>
         // An example to summarize a current job history file 'filename'
         // and the corresponding configuration file 'conf_filename'

         String filename = .. // assume the job history filename here
         String conf_filename = .. // assume the job configuration filename here

         InputStream jobConfInputStream = new FileInputStream(job_filename);
         InputStream jobHistoryInputStream = new FileInputStream(conf_filename);

         String jobID = TraceBuilder.extractJobID(job_filename);
         JobBuilder jb = new JobBuilder(jobID);

         // construct a list of interesting properties
         List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
         // add the interesting properties here
         interestingProperties.add("mapreduce.job.name");

         JobConfigurationParser jcp =
           new JobConfigurationParser(interestingProperties);

         // parse the configuration file
         jb.process(jcp.parse(jobConfInputStream));

         // parse the job history file
         JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
         try {
           HistoryEvent e;
           // read and process all the job history events
           while ((e = parser.nextEvent()) != null) {
             jobBuilder.process(e);
           }
         } finally {
           parser.close();
         }

         LoggedJob job = jb.build();
       </code>
       </pre>
      <b>Note:</b>
        The order of parsing the job configuration file or job history file is
        not important. Create one instance to parse the history file and job
        configuration.
    </li>
    <li>
     <code>DefaultOutputter</code><br>
       Implements <code>Outputter</code> and writes
       JSON object in text format to the output file.
       <code>DefaultOutputter</code> can be
       initialized with the output filename.

       <br><br>
       <i>Sample code</i>:
       <pre>
       <code>
         // An example to summarize a current job history file represented by
         // 'filename' and the configuration filename represented using
         // 'conf_filename'. Also output the job summary to 'out.json' along
         // with the cluster topology to 'topology.json'.

         String filename = .. // assume the job history filename here
         String conf_filename = .. // assume the job configuration filename here

         Configuration conf = new Configuration();
         DefaultOutputter do = new DefaultOutputter();
         do.init("out.json", conf);

         InputStream jobConfInputStream = new FileInputStream(filename);
         InputStream jobHistoryInputStream = new FileInputStream(conf_filename);

         // extract the job-id from the filename
         String jobID = TraceBuilder.extractJobID(filename);
         JobBuilder jb = new JobBuilder(jobID);
         TopologyBuilder tb = new TopologyBuilder();

         // construct a list of interesting properties
         List&lt;String&gt; interestingProperties = new ArrayList%lt;String&gt;();
         // add the interesting properties here
         interestingProperties.add("mapreduce.job.name");

         JobConfigurationParser jcp =
           new JobConfigurationParser(interestingProperties);

         // parse the configuration file
         tb.process(jcp.parse(jobConfInputStream));

         // read the job history file and pass it to the
         // TopologyBuilder.
         JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
         HistoryEvent e;
         while ((e = parser.nextEvent()) != null) {
           jb.process(e);
           tb.process(e);
         }

         LoggedJob j = jb.build();

         // serialize the job summary in json (text) format
         do.output(j);

         // close
         do.close();

         do.init("topology.json", conf);

         // get the job summary using TopologyBuilder
         LoggedNetworkTopology topology = topologyBuilder.build();

         // serialize the cluster topology in json (text) format
         do.output(topology);

         // close
         do.close();
       </code>
       </pre>
    </li>
    <li>
     <code>JobTraceReader</code><br>
       A reader for reading <code>LoggedJob</code> serialized using
       <code>DefaultOutputter</code>. <code>LoggedJob</code>
       provides various APIs for extracting job details. Following are the most
       commonly used ones
         <ul>
           <li><code>LoggedJob.getMapTasks()</code> : Get the map tasks</li>
           <li><code>LoggedJob.getReduceTasks()</code> : Get the reduce tasks</li>
           <li><code>LoggedJob.getOtherTasks()</code> : Get the setup/cleanup tasks</li>
           <li><code>LoggedJob.getOutcome()</code> : Get the job's outcome</li>
           <li><code>LoggedJob.getSubmitTime()</code> : Get the job's submit time</li>
           <li><code>LoggedJob.getFinishTime()</code> : Get the job's finish time</li>
         </ul>

       <br><br>
       <i>Sample code</i>:
       <pre>
       <code>
         // An example to read job summary from a trace file 'out.json'.
         JobTraceReader reader = new JobTracerReader("out.json");
         LoggedJob job = reader.getNext();
         while (job != null) {
           // .... process job level information
           for (LoggedTask task : job.getMapTasks()) {
             // process all the map tasks in the job
             for (LoggedTaskAttempt attempt : task.getAttempts()) {
               // process all the map task attempts in the job
             }
           }

           // get the next job
           job = reader.getNext();
         }
         reader.close();
       </code>
       </pre>
    </li>
    <li>
     <code>ClusterTopologyReader</code><br>
       A reader to read <code>LoggedNetworkTopology</code> serialized using
       <code>DefaultOutputter</code>. <code>ClusterTopologyReader</code> can be
       initialized using the serialized topology filename.
       <code>ClusterTopologyReader.get()</code> can
       be used to get the
       <code>LoggedNetworkTopology</code>.

       <br><br>
       <i>Sample code</i>:
       <pre>
       <code>
         // An example to read the cluster topology from a topology output file
         // 'topology.json'
         ClusterTopologyReader reader = new ClusterTopologyReader("topology.json");
         LoggedNetworkTopology topology  = reader.get();
         for (LoggedNetworkTopology t : topology.getChildren()) {
           // process the cluster topology
         }
         reader.close();
       </code>
       </pre>
    </li>
  </ol></div>
 </div>
 <!-- ======= START OF BOTTOM NAVBAR ====== -->
 <div class="bottomNav"><a name="navbar.bottom">
 <!--   -->
 </a>
 <div class="skipNav"><a href="#skip.navbar.bottom" title="Skip navigation links">Skip navigation links</a></div>
 <a name="navbar.bottom.firstrow">
 <!--   -->
 </a>
 <ul class="navList" title="Navigation">
 <li><a href="../../../../../overview-summary.html">Overview</a></li>
 <li class="navBarCell1Rev">Package</li>
 <li>Class</li>
 <li><a href="package-use.html">Use</a></li>
 <li><a href="package-tree.html">Tree</a></li>
 <li><a href="../../../../../deprecated-list.html">Deprecated</a></li>
 <li><a href="../../../../../index-all.html">Index</a></li>
 <li><a href="../../../../../help-doc.html">Help</a></li>
 </ul>
 </div>
 <div class="subNav">
 <ul class="navList">
 <li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev&nbsp;Package</a></li>
 <li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next&nbsp;Package</a></li>
 </ul>
 <ul class="navList">
 <li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li>
 <li><a href="package-summary.html" target="_top">No&nbsp;Frames</a></li>
 </ul>
 <ul class="navList" id="allclasses_navbar_bottom">
 <li><a href="../../../../../allclasses-noframe.html">All&nbsp;Classes</a></li>
 </ul>
 <div>
 <script type="text/javascript"><!--
   allClassesLink = document.getElementById("allclasses_navbar_bottom");
   if(window==top) {
     allClassesLink.style.display = "block";
   }
   else {
     allClassesLink.style.display = "none";
   }
   //-->
 </script>
 </div>
 <a name="skip.navbar.bottom">
 <!--   -->
 </a></div>
 <!-- ======== END OF BOTTOM NAVBAR ======= -->
 <p class="legalCopy"><small>Copyright &#169; 2021 <a href="https://www.apache.org">Apache Software Foundation</a>. All rights reserved.</small></p>
 </body>
 </html>
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
	<!-- NewPage -->
	<html lang="en">
	<head>
	<!-- Generated by javadoc (1.8.0_292) on Tue Jun 15 06:00:58 GMT 2021 -->
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
	<title>org.apache.hadoop.tools.rumen (Apache Hadoop Main 3.3.1 API)</title>
	<meta name="date" content="2021-06-15">
	<link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style">
	<script type="text/javascript" src="../../../../../script.js"></script>
	</head>
	<body>
	<script type="text/javascript"><!--
	try {
	if (location.href.indexOf('is-external=true') == -1) {
	parent.document.title="org.apache.hadoop.tools.rumen (Apache Hadoop Main 3.3.1 API)";
	}
	}
	catch(err) {
	}
	//-->
	</script>
	<noscript>
	<div>JavaScript is disabled on your browser.</div>
	</noscript>
	<!-- ========= START OF TOP NAVBAR ======= -->
	<div class="topNav"><a name="navbar.top">
	<!-- -->
	</a>
	<div class="skipNav"><a href="#skip.navbar.top" title="Skip navigation links">Skip navigation links</a></div>
	<a name="navbar.top.firstrow">
	<!-- -->
	</a>
	<ul class="navList" title="Navigation">
	<li><a href="../../../../../overview-summary.html">Overview</a></li>
	<li class="navBarCell1Rev">Package</li>
	<li>Class</li>
	<li><a href="package-use.html">Use</a></li>
	<li><a href="package-tree.html">Tree</a></li>
	<li><a href="../../../../../deprecated-list.html">Deprecated</a></li>
	<li><a href="../../../../../index-all.html">Index</a></li>
	<li><a href="../../../../../help-doc.html">Help</a></li>
	</ul>
	</div>
	<div class="subNav">
	<ul class="navList">
	<li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev Package</a></li>
	<li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next Package</a></li>
	</ul>
	<ul class="navList">
	<li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li>
	<li><a href="package-summary.html" target="_top">No Frames</a></li>
	</ul>
	<ul class="navList" id="allclasses_navbar_top">
	<li><a href="../../../../../allclasses-noframe.html">All Classes</a></li>
	</ul>
	<div>
	<script type="text/javascript"><!--
	allClassesLink = document.getElementById("allclasses_navbar_top");
	if(window==top) {
	allClassesLink.style.display = "block";
	}
	else {
	allClassesLink.style.display = "none";
	}
	//-->
	</script>
	</div>
	<a name="skip.navbar.top">
	<!-- -->
	</a></div>
	<!-- ========= END OF TOP NAVBAR ========= -->
	<div class="header">
	<h1 title="Package" class="title">Package org.apache.hadoop.tools.rumen</h1>
	<div class="docSummary">
	<div class="block">Rumen is a data extraction and analysis tool built for
	<a href="http://hadoop.apache.org/">Apache Hadoop</a>.</div>
	</div>
	<p>See: <a href="#package.description">Description</a></p>
	</div>
	<div class="contentContainer"><a name="package.description">
	<!-- -->
	</a>
	<h2 title="Package org.apache.hadoop.tools.rumen Description">Package org.apache.hadoop.tools.rumen Description</h2>
	<div class="block">Rumen is a data extraction and analysis tool built for
	<a href="http://hadoop.apache.org/">Apache Hadoop</a>. Rumen mines job history
	logs to extract meaningful data and stores it into an easily-parsed format.

	The default output format of Rumen is <a href="http://www.json.org">JSON</a>.
	Rumen uses the <a href="http://jackson.codehaus.org/">Jackson</a> library to
	create JSON objects.
	<br><br>

	The following classes can be used to programmatically invoke Rumen:
	<ol>
	<li>
	<code>JobConfigurationParser</code><br>
	A parser to parse and filter out interesting properties from job
	configuration.

	<br><br>
	<i>Sample code</i>:
	<pre>
	<code>
	// An example to parse and filter out job name

	String conf_filename = .. // assume the job configuration filename here

	// construct a list of interesting properties
	List<String> interestedProperties = new ArrayList<String>();
	interestedProperties.add("mapreduce.job.name");

	JobConfigurationParser jcp =
	new JobConfigurationParser(interestedProperties);

	InputStream in = new FileInputStream(conf_filename);
	Properties parsedProperties = jcp.parse(in);
	</code>
	</pre>
	Some of the commonly used interesting properties are enumerated in
	<code>JobConfPropertyNames</code>. <br><br>

	<b>Note:</b>
	A single instance of <code>JobConfigurationParser</code>
	can be used to parse multiple job configuration files.

	</li>
	<li>
	<code>JobHistoryParser</code> <br>
	A parser that parses job history files. It is an interface and actual
	implementations are defined as Enum in
	<code>JobHistoryParserFactory</code>. Note that
	<code>RewindableInputStream</code><br>
	is a wrapper class around <a href="https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html?is-external=true" title="class or interface in java.io"><code>InputStream</code></a> to make the input
	stream rewindable.

	<br>
	<i>Sample code</i>:
	<pre>
	<code>
	// An example to parse a current job history file i.e a job history
	// file for which the version is known

	String filename = .. // assume the job history filename here

	InputStream in = new FileInputStream(filename);

	HistoryEvent event = null;

	JobHistoryParser parser = new CurrentJHParser(in);

	event = parser.nextEvent();
	// process all the events
	while (event != null) {
	// ... process all event
	event = parser.nextEvent();
	}

	// close the parser and the underlying stream
	parser.close();
	</code>
	</pre>

	<code>JobHistoryParserFactory</code> provides a
	<code>JobHistoryParserFactory.getParser(org.apache.hadoop.tools.rumen.RewindableInputStream)</code>
	API to get a parser for parsing the job history file. Note that this
	API can be used if the job history version is unknown.<br><br>
	<i>Sample code</i>:
	<pre>
	<code>
	// An example to parse a job history for which the version is not
	// known i.e using JobHistoryParserFactory.getParser()

	String filename = .. // assume the job history filename here

	InputStream in = new FileInputStream(filename);
	RewindableInputStream ris = new RewindableInputStream(in);

	// JobHistoryParserFactory will check and return a parser that can
	// parse the file
	JobHistoryParser parser = JobHistoryParserFactory.getParser(ris);

	// now use the parser to parse the events
	HistoryEvent event = parser.nextEvent();
	while (event != null) {
	// ... process the event
	event = parser.nextEvent();
	}

	parser.close();
	</code>
	</pre>
	<b>Note:</b>
	Create one instance to parse a job history log and close it after use.
	</li>
	<li>
	<code>TopologyBuilder</code><br>
	Builds the cluster topology based on the job history events. Every
	job history file consists of events. Each event can be represented using
	<code>HistoryEvent</code>.
	These events can be passed to <code>TopologyBuilder</code> using
	<code>TopologyBuilder.process(org.apache.hadoop.mapreduce.jobhistory.HistoryEvent)</code>.
	A cluster topology can be represented using <code>LoggedNetworkTopology</code>.
	Once all the job history events are processed, the cluster
	topology can be obtained using <code>TopologyBuilder.build()</code>.

	<br><br>
	<i>Sample code</i>:
	<pre>
	<code>
	// Building topology for a job history file represented using
	// 'filename' and the corresponding configuration file represented
	// using 'conf_filename'
	String filename = .. // assume the job history filename here
	String conf_filename = .. // assume the job configuration filename here

	InputStream jobConfInputStream = new FileInputStream(filename);
	InputStream jobHistoryInputStream = new FileInputStream(conf_filename);

	TopologyBuilder tb = new TopologyBuilder();

	// construct a list of interesting properties
	List<String> interestingProperties = new ArrayList%lt;String>();
	// add the interesting properties here
	interestingProperties.add("mapreduce.job.name");

	JobConfigurationParser jcp =
	new JobConfigurationParser(interestingProperties);

	// parse the configuration file
	tb.process(jcp.parse(jobConfInputStream));

	// read the job history file and pass it to the
	// TopologyBuilder.
	JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
	HistoryEvent e;

	// read and process all the job history events
	while ((e = parser.nextEvent()) != null) {
	tb.process(e);
	}

	LoggedNetworkTopology topology = tb.build();
	</code>
	</pre>
	</li>
	<li>
	<code>JobBuilder</code><br>
	Summarizes a job history file.
	<code>JobHistoryUtils</code> provides
	<code>JobHistoryUtils.extractJobID(String)</code>
	API for extracting job id from job history or job configuration files
	which can be used for instantiating <code>JobBuilder</code>.
	<code>JobBuilder</code> generates a
	<code>LoggedJob</code> object via
	<code>JobBuilder.build()</code>.
	See <code>LoggedJob</code> for more details.

	<br><br>
	<i>Sample code</i>:
	<pre>
	<code>
	// An example to summarize a current job history file 'filename'
	// and the corresponding configuration file 'conf_filename'

	String filename = .. // assume the job history filename here
	String conf_filename = .. // assume the job configuration filename here

	InputStream jobConfInputStream = new FileInputStream(job_filename);
	InputStream jobHistoryInputStream = new FileInputStream(conf_filename);

	String jobID = TraceBuilder.extractJobID(job_filename);
	JobBuilder jb = new JobBuilder(jobID);

	// construct a list of interesting properties
	List<String> interestingProperties = new ArrayList%lt;String>();
	// add the interesting properties here
	interestingProperties.add("mapreduce.job.name");

	JobConfigurationParser jcp =
	new JobConfigurationParser(interestingProperties);

	// parse the configuration file
	jb.process(jcp.parse(jobConfInputStream));

	// parse the job history file
	JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
	try {
	HistoryEvent e;
	// read and process all the job history events
	while ((e = parser.nextEvent()) != null) {
	jobBuilder.process(e);
	}
	} finally {
	parser.close();
	}

	LoggedJob job = jb.build();
	</code>
	</pre>
	<b>Note:</b>
	The order of parsing the job configuration file or job history file is
	not important. Create one instance to parse the history file and job
	configuration.
	</li>
	<li>
	<code>DefaultOutputter</code><br>
	Implements <code>Outputter</code> and writes
	JSON object in text format to the output file.
	<code>DefaultOutputter</code> can be
	initialized with the output filename.

	<br><br>
	<i>Sample code</i>:
	<pre>
	<code>
	// An example to summarize a current job history file represented by
	// 'filename' and the configuration filename represented using
	// 'conf_filename'. Also output the job summary to 'out.json' along
	// with the cluster topology to 'topology.json'.

	String filename = .. // assume the job history filename here
	String conf_filename = .. // assume the job configuration filename here

	Configuration conf = new Configuration();
	DefaultOutputter do = new DefaultOutputter();
	do.init("out.json", conf);

	InputStream jobConfInputStream = new FileInputStream(filename);
	InputStream jobHistoryInputStream = new FileInputStream(conf_filename);

	// extract the job-id from the filename
	String jobID = TraceBuilder.extractJobID(filename);
	JobBuilder jb = new JobBuilder(jobID);
	TopologyBuilder tb = new TopologyBuilder();

	// construct a list of interesting properties
	List<String> interestingProperties = new ArrayList%lt;String>();
	// add the interesting properties here
	interestingProperties.add("mapreduce.job.name");

	JobConfigurationParser jcp =
	new JobConfigurationParser(interestingProperties);

	// parse the configuration file
	tb.process(jcp.parse(jobConfInputStream));

	// read the job history file and pass it to the
	// TopologyBuilder.
	JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
	HistoryEvent e;
	while ((e = parser.nextEvent()) != null) {
	jb.process(e);
	tb.process(e);
	}

	LoggedJob j = jb.build();

	// serialize the job summary in json (text) format
	do.output(j);

	// close
	do.close();

	do.init("topology.json", conf);

	// get the job summary using TopologyBuilder
	LoggedNetworkTopology topology = topologyBuilder.build();

	// serialize the cluster topology in json (text) format
	do.output(topology);

	// close
	do.close();
	</code>
	</pre>
	</li>
	<li>
	<code>JobTraceReader</code><br>
	A reader for reading <code>LoggedJob</code> serialized using
	<code>DefaultOutputter</code>. <code>LoggedJob</code>
	provides various APIs for extracting job details. Following are the most
	commonly used ones
	<ul>
	<li><code>LoggedJob.getMapTasks()</code> : Get the map tasks</li>
	<li><code>LoggedJob.getReduceTasks()</code> : Get the reduce tasks</li>
	<li><code>LoggedJob.getOtherTasks()</code> : Get the setup/cleanup tasks</li>
	<li><code>LoggedJob.getOutcome()</code> : Get the job's outcome</li>
	<li><code>LoggedJob.getSubmitTime()</code> : Get the job's submit time</li>
	<li><code>LoggedJob.getFinishTime()</code> : Get the job's finish time</li>
	</ul>

	<br><br>
	<i>Sample code</i>:
	<pre>
	<code>
	// An example to read job summary from a trace file 'out.json'.
	JobTraceReader reader = new JobTracerReader("out.json");
	LoggedJob job = reader.getNext();
	while (job != null) {
	// .... process job level information
	for (LoggedTask task : job.getMapTasks()) {
	// process all the map tasks in the job
	for (LoggedTaskAttempt attempt : task.getAttempts()) {
	// process all the map task attempts in the job
	}
	}

	// get the next job
	job = reader.getNext();
	}
	reader.close();
	</code>
	</pre>
	</li>
	<li>
	<code>ClusterTopologyReader</code><br>
	A reader to read <code>LoggedNetworkTopology</code> serialized using
	<code>DefaultOutputter</code>. <code>ClusterTopologyReader</code> can be
	initialized using the serialized topology filename.
	<code>ClusterTopologyReader.get()</code> can
	be used to get the
	<code>LoggedNetworkTopology</code>.

	<br><br>
	<i>Sample code</i>:
	<pre>
	<code>
	// An example to read the cluster topology from a topology output file
	// 'topology.json'
	ClusterTopologyReader reader = new ClusterTopologyReader("topology.json");
	LoggedNetworkTopology topology = reader.get();
	for (LoggedNetworkTopology t : topology.getChildren()) {
	// process the cluster topology
	}
	reader.close();
	</code>
	</pre>
	</li>
	</ol></div>
	</div>
	<!-- ======= START OF BOTTOM NAVBAR ====== -->
	<div class="bottomNav"><a name="navbar.bottom">
	<!-- -->
	</a>
	<div class="skipNav"><a href="#skip.navbar.bottom" title="Skip navigation links">Skip navigation links</a></div>
	<a name="navbar.bottom.firstrow">
	<!-- -->
	</a>
	<ul class="navList" title="Navigation">
	<li><a href="../../../../../overview-summary.html">Overview</a></li>
	<li class="navBarCell1Rev">Package</li>
	<li>Class</li>
	<li><a href="package-use.html">Use</a></li>
	<li><a href="package-tree.html">Tree</a></li>
	<li><a href="../../../../../deprecated-list.html">Deprecated</a></li>
	<li><a href="../../../../../index-all.html">Index</a></li>
	<li><a href="../../../../../help-doc.html">Help</a></li>
	</ul>
	</div>
	<div class="subNav">
	<ul class="navList">
	<li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev Package</a></li>
	<li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next Package</a></li>
	</ul>
	<ul class="navList">
	<li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li>
	<li><a href="package-summary.html" target="_top">No Frames</a></li>
	</ul>
	<ul class="navList" id="allclasses_navbar_bottom">
	<li><a href="../../../../../allclasses-noframe.html">All Classes</a></li>
	</ul>
	<div>
	<script type="text/javascript"><!--
	allClassesLink = document.getElementById("allclasses_navbar_bottom");
	if(window==top) {
	allClassesLink.style.display = "block";
	}
	else {
	allClassesLink.style.display = "none";
	}
	//-->
	</script>
	</div>
	<a name="skip.navbar.bottom">
	<!-- -->
	</a></div>
	<!-- ======== END OF BOTTOM NAVBAR ======= -->
	<p class="legalCopy"><small>Copyright © 2021 <a href="https://www.apache.org">Apache Software Foundation</a>. All rights reserved.</small></p>
	</body>
	</html>