| <?xml version="1.0"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd"> |
| <document> |
| <header> |
| <title>Synthetic Load Generator Guide </title> |
| </header> |
| <body> |
| <section> |
| <title>Overview</title> |
| <p> |
| The synthetic load generator (SLG) is a tool for testing NameNode behavior |
| under different client loads. The user can generate different mixes |
| of read, write, and list requests by specifying the probabilities of |
| read and write. The user controls the intensity of the load by adjusting |
| parameters for the number of worker threads and the delay between |
| operations. While load generators are running, the user can profile and |
| monitor the running of the NameNode. When a load generator exits, it |
| prints some NameNode statistics like the average execution time of each |
| kind of operation and the NameNode throughput. |
| </p> |
| </section> |
| |
| <section> |
| <title> Synopsis </title> |
| <p> |
| The synopsis of the command is: |
| </p> |
| <source>java LoadGenerator [options]</source> |
| <p> Options include:</p> |
| |
| <ul> |
| <li> |
| <code>-readProbability <read probability></code><br/> |
| The probability of the read operation; default is 0.3333. |
| </li> |
| |
| <li> |
| <code>-writeProbability <write probability></code><br/> |
| The probability of the write operations; default is 0.3333. |
| </li> |
| |
| <li> |
| <code>-root <test space root></code><br/> |
| The root of the test space; default is /testLoadSpace. |
| </li> |
| |
| <li> |
| <code>-maxDelayBetweenOps <maxDelayBetweenOpsInMillis></code><br/> |
| The maximum delay between two consecutive operations in a thread; default is 0 indicating no delay. |
| </li> |
| |
| <li> |
| <code>-numOfThreads <numOfThreads></code><br/> |
| The number of threads to spawn; default is 200. |
| </li> |
| |
| <li> |
| <code>-elapsedTime <elapsedTimeInSecs></code><br/> |
| The number of seconds that the program |
| will run; A value of zero indicates that the program runs |
| forever. The default value is 0. |
| </li> |
| |
| <li> |
| <code>-startTime <startTimeInMillis></code><br/> |
| The time that all worker threads |
| start to run. By default it is 10 seconds after the main |
| program starts running.This creates a barrier if more than |
| one load generator is running. |
| </li> |
| |
| <li> |
| <code>-seed <seed></code><br/> |
| The random generator seed for repeating |
| requests to NameNode when running with a single thread; |
| default is the current time. |
| </li> |
| |
| </ul> |
| |
| <p> |
| After command line argument parsing, the load generator traverses |
| the test space and builds a table of all directories and another table |
| of all files in the test space. It then waits until the start time to |
| spawn the number of worker threads as specified by the user. |
| |
| Each thread sends a stream of requests to NameNode. At each iteration, |
| it first decides if it is going to read a file, create a file, or |
| list a directory following the read and write probabilities specified |
| by the user. The listing probability is equal to |
| <em>1-read probability-write probability</em>. When reading, |
| it randomly picks a file in the test space and reads the entire file. |
| When writing, it randomly picks a directory in the test space and |
| creates a file there. |
| </p> |
| <p> |
| To avoid two threads with the same load |
| generator or from two different load generators creating the same |
| file, the file name consists of the current machine's host name |
| and the thread id. The length of the file follows Gaussian |
| distribution with an average size of 2 blocks and the standard |
| deviation of 1. The new file is filled with byte 'a'. To avoid the test |
| space growing indefinitely, the file is deleted immediately |
| after the file creation completes. While listing, it randomly picks |
| a directory in the test space and lists its content. |
| </p> |
| <p> |
| After an operation completes, the thread pauses for a random |
| amount of time in the range of [0, maxDelayBetweenOps] if the |
| specified maximum delay is not zero. All threads are stopped when |
| the specified elapsed time is passed. Before exiting, the program |
| prints the average execution for each kind of NameNode operations, |
| and the number of requests served by the NameNode per second. |
| </p> |
| |
| </section> |
| |
| <section> |
| <title> Test Space Population </title> |
| <p> |
| The user needs to populate a test space before running a |
| load generator. The structure generator generates a random |
| test space structure and the data generator creates the files |
| and directories of the test space in Hadoop distributed file system. |
| </p> |
| |
| <section> |
| <title> Structure Generator </title> |
| <p> |
| This tool generates a random namespace structure with the |
| following constraints: |
| </p> |
| |
| <ol> |
| <li>The number of subdirectories that a directory can have is |
| a random number in [minWidth, maxWidth].</li> |
| <li>The maximum depth of each subdirectory is a random number |
| [2*maxDepth/3, maxDepth].</li> |
| <li>Files are randomly placed in leaf directories. The size of |
| each file follows Gaussian distribution with an average size |
| of 1 block and a standard deviation of 1.</li> |
| </ol> |
| <p> |
| The generated namespace structure is described by two files in |
| the output directory. Each line of the first file contains the |
| full name of a leaf directory. Each line of the second file |
| contains the full name of a file and its size, separated by a blank. |
| </p> |
| <p> |
| The synopsis of the command is: |
| </p> |
| <source>java StructureGenerator [options]</source> |
| |
| <p>Options include:</p> |
| <ul> |
| <li> |
| <code>-maxDepth <maxDepth></code><br/> |
| Maximum depth of the directory tree; default is 5. |
| </li> |
| |
| <li> |
| <code>-minWidth <minWidth></code><br/> |
| Minimum number of subdirectories per directories; default is 1. |
| </li> |
| |
| <li> |
| <code>-maxWidth <maxWidth></code><br/> |
| Maximum number of subdirectories per directories; default is 5. |
| </li> |
| |
| <li> |
| <code>-numOfFiles <#OfFiles></code><br/> |
| The total number of files in the test space; default is 10. |
| </li> |
| |
| <li> |
| <code>-avgFileSize <avgFileSizeInBlocks></code><br/> |
| Average size of blocks; default is 1. |
| </li> |
| |
| <li> |
| <code>-outDir <outDir></code><br/> |
| Output directory; default is the current directory. |
| </li> |
| |
| <li> |
| <code>-seed <seed></code><br/> |
| Random number generator seed; default is the current time. |
| </li> |
| </ul> |
| </section> |
| |
| <section> |
| <title>Data Generator </title> |
| <p> |
| This tool reads the directory structure and file structure from |
| the input directory and creates the namespace in Hadoop distributed |
| file system. All files are filled with byte 'a'. |
| </p> |
| <p> |
| The synopsis of the command is: |
| </p> |
| <source>java DataGenerator [options]</source> |
| <p>Options include:</p> |
| <ul> |
| <li> |
| <code>-inDir <inDir></code><br/> |
| Input directory name where directory/file |
| structures are stored; default is the current directory. |
| </li> |
| <li> |
| <code>-root <test space root></code><br/> |
| The name of the root directory which the |
| new namespace is going to be placed under; |
| default is "/testLoadSpace". |
| </li> |
| </ul> |
| </section> |
| </section> |
| </body> |
| </document> |