| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <!-- NewPage --> |
| <html lang="en"> |
| <head> |
| <!-- Generated by javadoc (1.8.0_292) on Tue Jun 15 06:00:58 GMT 2021 --> |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <title>org.apache.hadoop.tools.rumen (Apache Hadoop Main 3.3.1 API)</title> |
| <meta name="date" content="2021-06-15"> |
| <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> |
| <script type="text/javascript" src="../../../../../script.js"></script> |
| </head> |
| <body> |
| <script type="text/javascript"><!-- |
| try { |
| if (location.href.indexOf('is-external=true') == -1) { |
| parent.document.title="org.apache.hadoop.tools.rumen (Apache Hadoop Main 3.3.1 API)"; |
| } |
| } |
| catch(err) { |
| } |
| //--> |
| </script> |
| <noscript> |
| <div>JavaScript is disabled on your browser.</div> |
| </noscript> |
| <!-- ========= START OF TOP NAVBAR ======= --> |
| <div class="topNav"><a name="navbar.top"> |
| <!-- --> |
| </a> |
| <div class="skipNav"><a href="#skip.navbar.top" title="Skip navigation links">Skip navigation links</a></div> |
| <a name="navbar.top.firstrow"> |
| <!-- --> |
| </a> |
| <ul class="navList" title="Navigation"> |
| <li><a href="../../../../../overview-summary.html">Overview</a></li> |
| <li class="navBarCell1Rev">Package</li> |
| <li>Class</li> |
| <li><a href="package-use.html">Use</a></li> |
| <li><a href="package-tree.html">Tree</a></li> |
| <li><a href="../../../../../deprecated-list.html">Deprecated</a></li> |
| <li><a href="../../../../../index-all.html">Index</a></li> |
| <li><a href="../../../../../help-doc.html">Help</a></li> |
| </ul> |
| </div> |
| <div class="subNav"> |
| <ul class="navList"> |
| <li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev Package</a></li> |
| <li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next Package</a></li> |
| </ul> |
| <ul class="navList"> |
| <li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li> |
| <li><a href="package-summary.html" target="_top">No Frames</a></li> |
| </ul> |
| <ul class="navList" id="allclasses_navbar_top"> |
| <li><a href="../../../../../allclasses-noframe.html">All Classes</a></li> |
| </ul> |
| <div> |
| <script type="text/javascript"><!-- |
| allClassesLink = document.getElementById("allclasses_navbar_top"); |
| if(window==top) { |
| allClassesLink.style.display = "block"; |
| } |
| else { |
| allClassesLink.style.display = "none"; |
| } |
| //--> |
| </script> |
| </div> |
| <a name="skip.navbar.top"> |
| <!-- --> |
| </a></div> |
| <!-- ========= END OF TOP NAVBAR ========= --> |
| <div class="header"> |
| <h1 title="Package" class="title">Package org.apache.hadoop.tools.rumen</h1> |
| <div class="docSummary"> |
| <div class="block">Rumen is a data extraction and analysis tool built for |
| <a href="http://hadoop.apache.org/">Apache Hadoop</a>.</div> |
| </div> |
| <p>See: <a href="#package.description">Description</a></p> |
| </div> |
| <div class="contentContainer"><a name="package.description"> |
| <!-- --> |
| </a> |
| <h2 title="Package org.apache.hadoop.tools.rumen Description">Package org.apache.hadoop.tools.rumen Description</h2> |
| <div class="block">Rumen is a data extraction and analysis tool built for |
| <a href="http://hadoop.apache.org/">Apache Hadoop</a>. Rumen mines job history |
| logs to extract meaningful data and stores it into an easily-parsed format. |
| |
| The default output format of Rumen is <a href="http://www.json.org">JSON</a>. |
| Rumen uses the <a href="http://jackson.codehaus.org/">Jackson</a> library to |
| create JSON objects. |
| <br><br> |
| |
| The following classes can be used to programmatically invoke Rumen: |
| <ol> |
| <li> |
| <code>JobConfigurationParser</code><br> |
| A parser to parse and filter out interesting properties from job |
| configuration. |
| |
| <br><br> |
| <i>Sample code</i>: |
| <pre> |
| <code> |
| // An example to parse and filter out job name |
| |
| String conf_filename = .. // assume the job configuration filename here |
| |
| // construct a list of interesting properties |
| List<String> interestedProperties = new ArrayList<String>(); |
| interestedProperties.add("mapreduce.job.name"); |
| |
| JobConfigurationParser jcp = |
| new JobConfigurationParser(interestedProperties); |
| |
| InputStream in = new FileInputStream(conf_filename); |
| Properties parsedProperties = jcp.parse(in); |
| </code> |
| </pre> |
| Some of the commonly used interesting properties are enumerated in |
| <code>JobConfPropertyNames</code>. <br><br> |
| |
| <b>Note:</b> |
| A single instance of <code>JobConfigurationParser</code> |
| can be used to parse multiple job configuration files. |
| |
| </li> |
| <li> |
| <code>JobHistoryParser</code> <br> |
| A parser that parses job history files. It is an interface and actual |
| implementations are defined as Enum in |
| <code>JobHistoryParserFactory</code>. Note that |
| <code>RewindableInputStream</code><br> |
| is a wrapper class around <a href="https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html?is-external=true" title="class or interface in java.io"><code>InputStream</code></a> to make the input |
| stream rewindable. |
| |
| <br> |
| <i>Sample code</i>: |
| <pre> |
| <code> |
| // An example to parse a current job history file i.e a job history |
| // file for which the version is known |
| |
| String filename = .. // assume the job history filename here |
| |
| InputStream in = new FileInputStream(filename); |
| |
| HistoryEvent event = null; |
| |
| JobHistoryParser parser = new CurrentJHParser(in); |
| |
| event = parser.nextEvent(); |
| // process all the events |
| while (event != null) { |
| // ... process all event |
| event = parser.nextEvent(); |
| } |
| |
| // close the parser and the underlying stream |
| parser.close(); |
| </code> |
| </pre> |
| |
| <code>JobHistoryParserFactory</code> provides a |
| <code>JobHistoryParserFactory.getParser(org.apache.hadoop.tools.rumen.RewindableInputStream)</code> |
| API to get a parser for parsing the job history file. Note that this |
| API can be used if the job history version is unknown.<br><br> |
| <i>Sample code</i>: |
| <pre> |
| <code> |
| // An example to parse a job history for which the version is not |
| // known i.e using JobHistoryParserFactory.getParser() |
| |
| String filename = .. // assume the job history filename here |
| |
| InputStream in = new FileInputStream(filename); |
| RewindableInputStream ris = new RewindableInputStream(in); |
| |
| // JobHistoryParserFactory will check and return a parser that can |
| // parse the file |
| JobHistoryParser parser = JobHistoryParserFactory.getParser(ris); |
| |
| // now use the parser to parse the events |
| HistoryEvent event = parser.nextEvent(); |
| while (event != null) { |
| // ... process the event |
| event = parser.nextEvent(); |
| } |
| |
| parser.close(); |
| </code> |
| </pre> |
| <b>Note:</b> |
| Create one instance to parse a job history log and close it after use. |
| </li> |
| <li> |
| <code>TopologyBuilder</code><br> |
| Builds the cluster topology based on the job history events. Every |
| job history file consists of events. Each event can be represented using |
| <code>HistoryEvent</code>. |
| These events can be passed to <code>TopologyBuilder</code> using |
| <code>TopologyBuilder.process(org.apache.hadoop.mapreduce.jobhistory.HistoryEvent)</code>. |
| A cluster topology can be represented using <code>LoggedNetworkTopology</code>. |
| Once all the job history events are processed, the cluster |
| topology can be obtained using <code>TopologyBuilder.build()</code>. |
| |
| <br><br> |
| <i>Sample code</i>: |
| <pre> |
| <code> |
| // Building topology for a job history file represented using |
| // 'filename' and the corresponding configuration file represented |
| // using 'conf_filename' |
| String filename = .. // assume the job history filename here |
| String conf_filename = .. // assume the job configuration filename here |
| |
| InputStream jobConfInputStream = new FileInputStream(filename); |
| InputStream jobHistoryInputStream = new FileInputStream(conf_filename); |
| |
| TopologyBuilder tb = new TopologyBuilder(); |
| |
| // construct a list of interesting properties |
| List<String> interestingProperties = new ArrayList%lt;String>(); |
| // add the interesting properties here |
| interestingProperties.add("mapreduce.job.name"); |
| |
| JobConfigurationParser jcp = |
| new JobConfigurationParser(interestingProperties); |
| |
| // parse the configuration file |
| tb.process(jcp.parse(jobConfInputStream)); |
| |
| // read the job history file and pass it to the |
| // TopologyBuilder. |
| JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream); |
| HistoryEvent e; |
| |
| // read and process all the job history events |
| while ((e = parser.nextEvent()) != null) { |
| tb.process(e); |
| } |
| |
| LoggedNetworkTopology topology = tb.build(); |
| </code> |
| </pre> |
| </li> |
| <li> |
| <code>JobBuilder</code><br> |
| Summarizes a job history file. |
| <code>JobHistoryUtils</code> provides |
| <code>JobHistoryUtils.extractJobID(String)</code> |
| API for extracting job id from job history or job configuration files |
| which can be used for instantiating <code>JobBuilder</code>. |
| <code>JobBuilder</code> generates a |
| <code>LoggedJob</code> object via |
| <code>JobBuilder.build()</code>. |
| See <code>LoggedJob</code> for more details. |
| |
| <br><br> |
| <i>Sample code</i>: |
| <pre> |
| <code> |
| // An example to summarize a current job history file 'filename' |
| // and the corresponding configuration file 'conf_filename' |
| |
| String filename = .. // assume the job history filename here |
| String conf_filename = .. // assume the job configuration filename here |
| |
| InputStream jobConfInputStream = new FileInputStream(job_filename); |
| InputStream jobHistoryInputStream = new FileInputStream(conf_filename); |
| |
| String jobID = TraceBuilder.extractJobID(job_filename); |
| JobBuilder jb = new JobBuilder(jobID); |
| |
| // construct a list of interesting properties |
| List<String> interestingProperties = new ArrayList%lt;String>(); |
| // add the interesting properties here |
| interestingProperties.add("mapreduce.job.name"); |
| |
| JobConfigurationParser jcp = |
| new JobConfigurationParser(interestingProperties); |
| |
| // parse the configuration file |
| jb.process(jcp.parse(jobConfInputStream)); |
| |
| // parse the job history file |
| JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream); |
| try { |
| HistoryEvent e; |
| // read and process all the job history events |
| while ((e = parser.nextEvent()) != null) { |
| jobBuilder.process(e); |
| } |
| } finally { |
| parser.close(); |
| } |
| |
| LoggedJob job = jb.build(); |
| </code> |
| </pre> |
| <b>Note:</b> |
| The order of parsing the job configuration file or job history file is |
| not important. Create one instance to parse the history file and job |
| configuration. |
| </li> |
| <li> |
| <code>DefaultOutputter</code><br> |
| Implements <code>Outputter</code> and writes |
| JSON object in text format to the output file. |
| <code>DefaultOutputter</code> can be |
| initialized with the output filename. |
| |
| <br><br> |
| <i>Sample code</i>: |
| <pre> |
| <code> |
| // An example to summarize a current job history file represented by |
| // 'filename' and the configuration filename represented using |
| // 'conf_filename'. Also output the job summary to 'out.json' along |
| // with the cluster topology to 'topology.json'. |
| |
| String filename = .. // assume the job history filename here |
| String conf_filename = .. // assume the job configuration filename here |
| |
| Configuration conf = new Configuration(); |
| DefaultOutputter do = new DefaultOutputter(); |
| do.init("out.json", conf); |
| |
| InputStream jobConfInputStream = new FileInputStream(filename); |
| InputStream jobHistoryInputStream = new FileInputStream(conf_filename); |
| |
| // extract the job-id from the filename |
| String jobID = TraceBuilder.extractJobID(filename); |
| JobBuilder jb = new JobBuilder(jobID); |
| TopologyBuilder tb = new TopologyBuilder(); |
| |
| // construct a list of interesting properties |
| List<String> interestingProperties = new ArrayList%lt;String>(); |
| // add the interesting properties here |
| interestingProperties.add("mapreduce.job.name"); |
| |
| JobConfigurationParser jcp = |
| new JobConfigurationParser(interestingProperties); |
| |
| // parse the configuration file |
| tb.process(jcp.parse(jobConfInputStream)); |
| |
| // read the job history file and pass it to the |
| // TopologyBuilder. |
| JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream); |
| HistoryEvent e; |
| while ((e = parser.nextEvent()) != null) { |
| jb.process(e); |
| tb.process(e); |
| } |
| |
| LoggedJob j = jb.build(); |
| |
| // serialize the job summary in json (text) format |
| do.output(j); |
| |
| // close |
| do.close(); |
| |
| do.init("topology.json", conf); |
| |
| // get the job summary using TopologyBuilder |
| LoggedNetworkTopology topology = topologyBuilder.build(); |
| |
| // serialize the cluster topology in json (text) format |
| do.output(topology); |
| |
| // close |
| do.close(); |
| </code> |
| </pre> |
| </li> |
| <li> |
| <code>JobTraceReader</code><br> |
| A reader for reading <code>LoggedJob</code> serialized using |
| <code>DefaultOutputter</code>. <code>LoggedJob</code> |
| provides various APIs for extracting job details. Following are the most |
| commonly used ones |
| <ul> |
| <li><code>LoggedJob.getMapTasks()</code> : Get the map tasks</li> |
| <li><code>LoggedJob.getReduceTasks()</code> : Get the reduce tasks</li> |
| <li><code>LoggedJob.getOtherTasks()</code> : Get the setup/cleanup tasks</li> |
| <li><code>LoggedJob.getOutcome()</code> : Get the job's outcome</li> |
| <li><code>LoggedJob.getSubmitTime()</code> : Get the job's submit time</li> |
| <li><code>LoggedJob.getFinishTime()</code> : Get the job's finish time</li> |
| </ul> |
| |
| <br><br> |
| <i>Sample code</i>: |
| <pre> |
| <code> |
| // An example to read job summary from a trace file 'out.json'. |
| JobTraceReader reader = new JobTracerReader("out.json"); |
| LoggedJob job = reader.getNext(); |
| while (job != null) { |
| // .... process job level information |
| for (LoggedTask task : job.getMapTasks()) { |
| // process all the map tasks in the job |
| for (LoggedTaskAttempt attempt : task.getAttempts()) { |
| // process all the map task attempts in the job |
| } |
| } |
| |
| // get the next job |
| job = reader.getNext(); |
| } |
| reader.close(); |
| </code> |
| </pre> |
| </li> |
| <li> |
| <code>ClusterTopologyReader</code><br> |
| A reader to read <code>LoggedNetworkTopology</code> serialized using |
| <code>DefaultOutputter</code>. <code>ClusterTopologyReader</code> can be |
| initialized using the serialized topology filename. |
| <code>ClusterTopologyReader.get()</code> can |
| be used to get the |
| <code>LoggedNetworkTopology</code>. |
| |
| <br><br> |
| <i>Sample code</i>: |
| <pre> |
| <code> |
| // An example to read the cluster topology from a topology output file |
| // 'topology.json' |
| ClusterTopologyReader reader = new ClusterTopologyReader("topology.json"); |
| LoggedNetworkTopology topology = reader.get(); |
| for (LoggedNetworkTopology t : topology.getChildren()) { |
| // process the cluster topology |
| } |
| reader.close(); |
| </code> |
| </pre> |
| </li> |
| </ol></div> |
| </div> |
| <!-- ======= START OF BOTTOM NAVBAR ====== --> |
| <div class="bottomNav"><a name="navbar.bottom"> |
| <!-- --> |
| </a> |
| <div class="skipNav"><a href="#skip.navbar.bottom" title="Skip navigation links">Skip navigation links</a></div> |
| <a name="navbar.bottom.firstrow"> |
| <!-- --> |
| </a> |
| <ul class="navList" title="Navigation"> |
| <li><a href="../../../../../overview-summary.html">Overview</a></li> |
| <li class="navBarCell1Rev">Package</li> |
| <li>Class</li> |
| <li><a href="package-use.html">Use</a></li> |
| <li><a href="package-tree.html">Tree</a></li> |
| <li><a href="../../../../../deprecated-list.html">Deprecated</a></li> |
| <li><a href="../../../../../index-all.html">Index</a></li> |
| <li><a href="../../../../../help-doc.html">Help</a></li> |
| </ul> |
| </div> |
| <div class="subNav"> |
| <ul class="navList"> |
| <li><a href="../../../../../org/apache/hadoop/tools/protocolPB/package-summary.html">Prev Package</a></li> |
| <li><a href="../../../../../org/apache/hadoop/tools/rumen/anonymization/package-summary.html">Next Package</a></li> |
| </ul> |
| <ul class="navList"> |
| <li><a href="../../../../../index.html?org/apache/hadoop/tools/rumen/package-summary.html" target="_top">Frames</a></li> |
| <li><a href="package-summary.html" target="_top">No Frames</a></li> |
| </ul> |
| <ul class="navList" id="allclasses_navbar_bottom"> |
| <li><a href="../../../../../allclasses-noframe.html">All Classes</a></li> |
| </ul> |
| <div> |
| <script type="text/javascript"><!-- |
| allClassesLink = document.getElementById("allclasses_navbar_bottom"); |
| if(window==top) { |
| allClassesLink.style.display = "block"; |
| } |
| else { |
| allClassesLink.style.display = "none"; |
| } |
| //--> |
| </script> |
| </div> |
| <a name="skip.navbar.bottom"> |
| <!-- --> |
| </a></div> |
| <!-- ======== END OF BOTTOM NAVBAR ======= --> |
| <p class="legalCopy"><small>Copyright © 2021 <a href="https://www.apache.org">Apache Software Foundation</a>. All rights reserved.</small></p> |
| </body> |
| </html> |