| <?xml version="1.0"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd"> |
| |
| <document> |
| |
| <header> |
| <title>Offline Edits Viewer Guide</title> |
| <authors> |
| <person name="Erik Steffl" email="steffl@yahoo-inc.com"/> |
| </authors> |
| </header> |
| |
| <body> |
| |
| <section> |
| |
| <title>Overview</title> |
| |
| <p> |
| Offline Edits Viewer is a tool to parse the Edits log file. The |
| current processors are mostly useful for conversion between |
| different formats, including XML which is human readable and |
| easier to edit than native binary format. |
| </p> |
| |
| <p> |
| The tool can parse the edits formats -18 (roughly Hadoop 0.19) |
| and later. The tool operates on files only, it does not need |
| Hadoop cluster to be running. |
| </p> |
| |
| <p>Input formats supported:</p> |
| <ol> |
| <li><strong>binary</strong>: native binary format that Hadoop uses internally</li> |
| <li> |
| <strong>xml</strong>: XML format, as produced by |
| <strong>xml</strong> processor, used if filename has xml |
| (case insensitive) extension |
| </li> |
| </ol> |
| |
| <p> |
| The Offline Edits Viewer provides several output processors |
| (unless stated otherwise the output of the processor can be |
| converted back to original edits file): |
| </p> |
| <ol> |
| <li><strong>binary</strong>: native binary format that Hadoop uses internally</li> |
| <li><strong>xml</strong>: XML format</li> |
| <li><strong>stats</strong>: prints out statistics, this cannot be converted back to Edits file</li> |
| </ol> |
| |
| </section> <!-- Overview --> |
| |
| <section> |
| |
| <title>Usage</title> |
| |
| <p><code>bash$ bin/hdfs oev -i edits -o edits.xml</code></p> |
| |
| <table> |
| <tr><th>Flag</th><th>Description</th></tr> |
| <tr> |
| <td><code>[-i|--inputFile] <input file></code></td> |
| <td> |
| Specify the input edits log file to process. Xml (case |
| insensitive) extension means XML format otherwise binary |
| format is assumed. Required. |
| </td> |
| </tr> |
| <tr> |
| <td><code>[-o|--outputFile] <output file></code></td> |
| <td> |
| Specify the output filename, if the specified output processor |
| generates one. If the specified file already exists, it is |
| silently overwritten. Required. |
| </td> |
| </tr> |
| <tr> |
| <td><code>[-p|--processor] <processor></code></td> |
| <td> |
| Specify the image processor to apply against the image |
| file. Currently valid options are <strong>binary</strong>, |
| <strong>xml</strong> (default) and <strong>stats</strong>. |
| </td> |
| </tr> |
| <tr> |
| <td><code>[-v|--verbose]-</code></td> |
| <td> |
| Print the input and output filenames and pipe output of |
| processor to console as well as specified file. On extremely |
| large files, this may increase processing time by an order |
| of magnitude. |
| </td> |
| </tr> |
| <tr> |
| <td><code>[-h|--help]</code></td> |
| <td> |
| Display the tool usage and help information and exit. |
| </td> |
| </tr> |
| </table> |
| |
| </section> <!-- Usage --> |
| |
| <section> |
| |
| <title>Case study: Hadoop cluster recovery</title> |
| |
| <p> |
| In case there is some problem with hadoop cluster and the edits |
| file is corrupted it is possible to save at least part of the |
| edits file that is correct. This can be done by converting the |
| binary edits to XML, edit it manually and then convert it back |
| to binary. The most common problem is that the edits file is |
| missing the closing record (record that has opCode -1). This |
| should be recognized by the tool and the XML format should be |
| properly closed. |
| </p> |
| |
| <p> |
| If there is no closing record in the XML file you can add one |
| after last correct record. Anything after the record with opCode |
| -1 is ignored. |
| </p> |
| |
| <p>Example of a closing record (with opCode -1):</p> |
| <source> |
| <RECORD> |
| <OPCODE>-1</OPCODE> |
| <DATA> |
| </DATA> |
| </RECORD> |
| </source> |
| |
| </section> <!-- Case study: Hadoop cluster recovery --> |
| |
| </body> |
| |
| </document> |