| ~~ Licensed under the Apache License, Version 2.0 (the "License"); |
| ~~ you may not use this file except in compliance with the License. |
| ~~ You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. See accompanying LICENSE file. |
| |
| --- |
| Offline Image Viewer Guide |
| --- |
| --- |
| ${maven.build.timestamp} |
| |
| Offline Image Viewer Guide |
| |
| %{toc|section=1|fromDepth=0} |
| |
| * Overview |
| |
| The Offline Image Viewer is a tool to dump the contents of hdfs fsimage |
| files to a human-readable format and provide read-only WebHDFS API |
| in order to allow offline analysis and examination of an Hadoop cluster's |
| namespace. The tool is able to process very large image files relatively |
| quickly. The tool handles the layout formats that were included with Hadoop |
| versions 2.4 and up. If you want to handle older layout formats, you can |
| use the Offline Image Viewer of Hadoop 2.3 or {{oiv_legacy Command}}. |
| If the tool is not able to process an image file, it will exit cleanly. |
| The Offline Image Viewer does not require a Hadoop cluster to be running; |
| it is entirely offline in its operation. |
| |
| The Offline Image Viewer provides several output processors: |
| |
| [[1]] Web is the default output processor. It launches a HTTP server |
| that exposes read-only WebHDFS API. Users can investigate the namespace |
| interactively by using HTTP REST API. |
| |
| [[2]] XML creates an XML document of the fsimage and includes all of the |
| information within the fsimage, similar to the lsr processor. The |
| output of this processor is amenable to automated processing and |
| analysis with XML tools. Due to the verbosity of the XML syntax, |
| this processor will also generate the largest amount of output. |
| |
| [[3]] FileDistribution is the tool for analyzing file sizes in the |
| namespace image. In order to run the tool one should define a range |
| of integers [0, maxSize] by specifying maxSize and a step. The |
| range of integers is divided into segments of size step: [0, s[1], |
| ..., s[n-1], maxSize], and the processor calculates how many files |
| in the system fall into each segment [s[i-1], s[i]). Note that |
| files larger than maxSize always fall into the very last segment. |
| The output file is formatted as a tab separated two column table: |
| Size and NumFiles. Where Size represents the start of the segment, |
| and numFiles is the number of files form the image which size falls |
| in this segment. |
| |
| * Usage |
| |
| ** Web Processor |
| |
| Web processor launches a HTTP server which exposes read-only WebHDFS API. |
| Users can specify the address to listen by -addr option (default by |
| localhost:5978). |
| |
| ---- |
| bash$ bin/hdfs oiv -i fsimage |
| 14/04/07 13:25:14 INFO offlineImageViewer.WebImageViewer: WebImageViewer |
| started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer. |
| ---- |
| |
| Users can access the viewer and get the information of the fsimage by |
| the following shell command: |
| |
| ---- |
| bash$ bin/hdfs dfs -ls webhdfs://127.0.0.1:5978/ |
| Found 2 items |
| drwxrwx--- - root supergroup 0 2014-03-26 20:16 webhdfs://127.0.0.1:5978/tmp |
| drwxr-xr-x - root supergroup 0 2014-03-31 14:08 webhdfs://127.0.0.1:5978/user |
| ---- |
| |
| To get the information of all the files and directories, you can simply use |
| the following command: |
| |
| ---- |
| bash$ bin/hdfs dfs -ls -R webhdfs://127.0.0.1:5978/ |
| ---- |
| |
| Users can also get JSON formatted FileStatuses via HTTP REST API. |
| |
| ---- |
| bash$ curl -i http://127.0.0.1:5978/webhdfs/v1/?op=liststatus |
| HTTP/1.1 200 OK |
| Content-Type: application/json |
| Content-Length: 252 |
| |
| {"FileStatuses":{"FileStatus":[ |
| {"fileId":16386,"accessTime":0,"replication":0,"owner":"theuser","length":0,"permission":"755","blockSize":0,"modificationTime":1392772497282,"type":"DIRECTORY","group":"supergroup","childrenNum":1,"pathSuffix":"user"} |
| ]}} |
| ---- |
| |
| The Web processor now supports the following operations: |
| |
| * {{{./WebHDFS.html#List_a_Directory}LISTSTATUS}} |
| |
| * {{{./WebHDFS.html#Status_of_a_FileDirectory}GETFILESTATUS}} |
| |
| * {{{./WebHDFS.html#Get_ACL_Status}GETACLSTATUS}} |
| |
| ** XML Processor |
| |
| XML Processor is used to dump all the contents in the fsimage. Users can |
| specify input and output file via -i and -o command-line. |
| |
| ---- |
| bash$ bin/hdfs oiv -p XML -i fsimage -o fsimage.xml |
| ---- |
| |
| This will create a file named fsimage.xml contains all the information in |
| the fsimage. For very large image files, this process may take several |
| minutes. |
| |
| Applying the Offline Image Viewer with XML processor would result in the |
| following output: |
| |
| ---- |
| <?xml version="1.0"?> |
| <fsimage> |
| <NameSection> |
| <genstampV1>1000</genstampV1> |
| <genstampV2>1002</genstampV2> |
| <genstampV1Limit>0</genstampV1Limit> |
| <lastAllocatedBlockId>1073741826</lastAllocatedBlockId> |
| <txid>37</txid> |
| </NameSection> |
| <INodeSection> |
| <lastInodeId>16400</lastInodeId> |
| <inode> |
| <id>16385</id> |
| <type>DIRECTORY</type> |
| <name></name> |
| <mtime>1392772497282</mtime> |
| <permission>theuser:supergroup:rwxr-xr-x</permission> |
| <nsquota>9223372036854775807</nsquota> |
| <dsquota>-1</dsquota> |
| </inode> |
| ...remaining output omitted... |
| ---- |
| |
| * Options |
| |
| *-----------------------:-----------------------------------+ |
| | <<Flag>> | <<Description>> | |
| *-----------------------:-----------------------------------+ |
| | <<<-i>>>\|<<<--inputFile>>> <input file> | Specify the input fsimage file |
| | | to process. Required. |
| *-----------------------:-----------------------------------+ |
| | <<<-o>>>\|<<<--outputFile>>> <output file> | Specify the output filename, |
| | | if the specified output processor generates one. If |
| | | the specified file already exists, it is silently |
| | | overwritten. (output to stdout by default) |
| *-----------------------:-----------------------------------+ |
| | <<<-p>>>\|<<<--processor>>> <processor> | Specify the image processor to |
| | | apply against the image file. Currently valid options |
| | | are Web (default), XML and FileDistribution. |
| *-----------------------:-----------------------------------+ |
| | <<<-addr>>> <address> | Specify the address(host:port) to listen. |
| | | (localhost:5978 by default). This option is used with |
| | | Web processor. |
| *-----------------------:-----------------------------------+ |
| | <<<-maxSize>>> <size> | Specify the range [0, maxSize] of file sizes to be |
| | | analyzed in bytes (128GB by default). This option is |
| | | used with FileDistribution processor. |
| *-----------------------:-----------------------------------+ |
| | <<<-step>>> <size> | Specify the granularity of the distribution in bytes |
| | | (2MB by default). This option is used with |
| | | FileDistribution processor. |
| *-----------------------:-----------------------------------+ |
| | <<<-h>>>\|<<<--help>>>| Display the tool usage and help information and |
| | | exit. |
| *-----------------------:-----------------------------------+ |
| |
| * Analyzing Results |
| |
| The Offline Image Viewer makes it easy to gather large amounts of data |
| about the hdfs namespace. This information can then be used to explore |
| file system usage patterns or find specific files that match arbitrary |
| criteria, along with other types of namespace analysis. |
| |
| * oiv_legacy Command |
| |
| Due to the internal layout changes introduced by the ProtocolBuffer-based |
| fsimage ({{{https://issues.apache.org/jira/browse/HDFS-5698}HDFS-5698}}), |
| OfflineImageViewer consumes excessive amount of memory and loses some |
| functions such as Indented and Delimited processor. If you want to process |
| without large amount of memory or use these processors, you can use |
| <<<oiv_legacy>>> command (same as <<<oiv>>> in Hadoop 2.3). |
| |
| ** Usage |
| |
| 1. Set <<<dfs.namenode.legacy-oiv-image.dir>>> to an appropriate directory |
| to make standby NameNode or SecondaryNameNode save its namespace in the |
| old fsimage format during checkpointing. |
| |
| 2. Use <<<oiv_legacy>>> command to the old format fsimage. |
| |
| ---- |
| bash$ bin/hdfs oiv_legacy -i fsimage_old -o output |
| ---- |
| |
| ** Options |
| |
| *-----------------------:-----------------------------------+ |
| | <<Flag>> | <<Description>> | |
| *-----------------------:-----------------------------------+ |
| | <<<-i>>>\|<<<--inputFile>>> <input file> | Specify the input fsimage file to |
| | | process. Required. |
| *-----------------------:-----------------------------------+ |
| | <<<-o>>>\|<<<--outputFile>>> <output file> | Specify the output filename, if |
| | | the specified output processor generates one. If the |
| | | specified file already exists, it is silently |
| | | overwritten. Required. |
| *-----------------------:-----------------------------------+ |
| | <<<-p>>>\|<<<--processor>>> <processor> | Specify the image processor to |
| | | apply against the image file. Valid options are |
| | | Ls (default), XML, Delimited, Indented, and |
| | | FileDistribution. |
| *-----------------------:-----------------------------------+ |
| | <<<-skipBlocks>>> | Do not enumerate individual blocks within files. This |
| | | may save processing time and outfile file space on |
| | | namespaces with very large files. The Ls processor |
| | | reads the blocks to correctly determine file sizes |
| | | and ignores this option. |
| *-----------------------:-----------------------------------+ |
| | <<<-printToScreen>>> | Pipe output of processor to console as well as |
| | | specified file. On extremely large namespaces, this |
| | | may increase processing time by an order of |
| | | magnitude. |
| *-----------------------:-----------------------------------+ |
| | <<<-delimiter>>> <arg>| When used in conjunction with the Delimited |
| | | processor, replaces the default tab delimiter with |
| | | the string specified by <arg>. |
| *-----------------------:-----------------------------------+ |
| | <<<-h>>>\|<<<--help>>>| Display the tool usage and help information and exit. |
| *-----------------------:-----------------------------------+ |