MapReduce Commands Guide

Overview

All mapreduce commands are invoked by the bin/mapred script. Running the mapred script without any arguments prints the description for all commands.

Usage: mapred [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Hadoop has an option parsing framework that employs parsing generic options as well as running classes.

COMMAND_OPTIONSDescription
SHELL_OPTIONSThe common set of shell options. These are documented on the Hadoop Commands Reference page.
GENERIC_OPTIONSThe common set of options supported by multiple commands. See the Hadoop Commands Reference for more information.
COMMAND_OPTIONSVarious commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands.

User Commands

Commands useful for users of a hadoop cluster.

archive

Creates a hadoop archive. More information can be found at Hadoop Archives Guide.

archive-logs

A tool to combine YARN aggregated logs into Hadoop archives to reduce the number of files in HDFS. More information can be found at Hadoop Archive Logs Guide.

classpath

Usage: yarn classpath [--glob |--jar <path> |-h |--help]

COMMAND_OPTIONDescription
--globexpand wildcards
--jar pathwrite classpath as manifest in jar named path
-h, --helpprint help

Prints the class path needed to get the Hadoop jar and the required libraries. If called without arguments, then prints the classpath set up by the command scripts, which is likely to contain wildcards in the classpath entries. Additional options print the classpath after wildcard expansion or write the classpath into the manifest of a jar file. The latter is useful in environments where wildcards cannot be used and the expanded classpath exceeds the maximum supported command line length.

distcp

Copy file or directories recursively. More information can be found at Hadoop DistCp Guide.

job

Command to interact with Map Reduce Jobs.

Usage: mapred job | [GENERIC_OPTIONS] | [-submit <job-file>] | [-status <job-id>] | [-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] | [-events <job-id> <from-event-#> <#-of-events>] | [-history [all] <jobHistoryFile|jobId> [-outfile <file>] [-format <human|json>]] | [-list [all]] | [-kill-task <task-id>] | [-fail-task <task-id>] | [-set-priority <job-id> <priority>] | [-list-active-trackers] | [-list-blacklisted-trackers] | [-list-attempt-ids <job-id> <task-type> <task-state>] [-logs <job-id> <task-attempt-id>] [-config <job-id> <file>]

COMMAND_OPTIONDescription
-submit job-fileSubmits the job.
-status job-idPrints the map and reduce completion percentage and all job counters.
-counter job-id group-name counter-namePrints the counter value.
-kill job-idKills the job.
-events job-id from-event-# #-of-eventsPrints the events' details received by jobtracker for the given range.
-history [all] *jobHistoryFilejobId* [-outfile file] [-format *human
-list [all]Displays jobs which are yet to complete. -list all displays all jobs.
-kill-task task-idKills the task. Killed tasks are NOT counted against failed attempts.
-fail-task task-idFails the task. Failed tasks are counted against failed attempts.
-set-priority job-id priorityChanges the priority of the job. Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
-list-active-trackersList all the active NodeManagers in the cluster.
-list-blacklisted-trackersList the black listed task trackers in the cluster. This command is not supported in MRv2 based cluster.
-list-attempt-ids job-id task-type task-stateList the attempt-ids based on the task type and the status given. Valid values for task-type are REDUCE, MAP. Valid values for task-state are running, pending, completed, failed, killed.
-logs job-id task-attempt-idDump the container log for a job if taskAttemptId is not specified, otherwise dump the log for the task with the specified taskAttemptId. The logs will be dumped in system out.
-config job-id fileDownload the job configuration file.

pipes

Runs a pipes job.

Usage: mapred pipes [-conf <path>] [-jobconf <key=value>, <key=value>, ...] [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat <class>] [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer <class>] [-program <executable>] [-reduces <num>]

COMMAND_OPTIONDescription
-conf pathConfiguration for job
-jobconf key=value, key=value, ...Add/override configuration for job
-input pathInput directory
-output pathOutput directory
-jar jar fileJar filename
-inputformat classInputFormat class
-map classJava Map class
-partitioner classJava Partitioner
-reduce classJava Reduce class
-writer classJava RecordWriter
-program executableExecutable URI
-reduces numNumber of reduces

queue

command to interact and view Job Queue information

Usage: mapred queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]

COMMAND_OPTIONDescription
-listGets list of Job Queues configured in the system. Along with scheduling information associated with the job queues.
-info job-queue-name [-showJobs]Displays the job queue information and associated scheduling information of particular job queue. If -showJobs options is present a list of jobs submitted to the particular job queue is displayed.
-showaclsDisplays the queue name and associated queue operations allowed for the current user. The list consists of only those queues to which the user has access.

version

Prints the version.

Usage: mapred version

envvars

Usage: mapred envvars

Display computed Hadoop environment variables.

Administration Commands

Commands useful for administrators of a hadoop cluster.

historyserver

Start JobHistoryServer.

Usage: mapred historyserver

hsadmin

Runs a MapReduce hsadmin client for execute JobHistoryServer administrative commands.

Usage: mapred hsadmin [-refreshUserToGroupsMappings] | [-refreshSuperUserGroupsConfiguration] | [-refreshAdminAcls] | [-refreshLoadedJobCache] | [-refreshLogRetentionSettings] | [-refreshJobRetentionSettings] | [-getGroups [username]] | [-help [cmd]]

COMMAND_OPTIONDescription
-refreshUserToGroupsMappingsRefresh user-to-groups mappings
-refreshSuperUserGroupsConfigurationRefresh superuser proxy groups mappings
-refreshAdminAclsRefresh acls for administration of Job history server
-refreshLoadedJobCacheRefresh loaded job cache of Job history server
-refreshJobRetentionSettingsRefresh job history period, job cleaner settings
-refreshLogRetentionSettingsRefresh log retention period and log retention check interval
-getGroups [username]Get the groups which given user belongs to
-help [cmd]Displays help for the given command or all commands if none is specified.

frameworkuploader

Collects framework jars and uploads them to HDFS as a tarball.

Usage: mapred frameworkuploader -target <target> [-fs <filesystem>] [-input <classpath>] [-blacklist <list>] [-whitelist <list>] [-initialReplication <num>] [-acceptableReplication <num>] [-finalReplication <num>] [-timeout <seconds>] [-nosymlink]

COMMAND_OPTIONDescription
-input classpathThis is the input classpath that is searched for jar files to be included in the tarball.
-fs filesystemThe target file system. Defaults to the default filesystem set by fs.defaultFS.
-target targetThis is the target location of the framework tarball, optionally followed by a # with the localized alias. An example would be /usr/lib/framework.tar#framework. Make sure the target directory is readable by all users but it is not writable by others than administrators to protect cluster security.
-blacklist listThis is a comma separated regex array to filter the jar file names to exclude from the class path. It can be used for example to exclude test jars or Hadoop services that are not necessary to localize.
-whitelist listThis is a comma separated regex array to include certain jar files. This can be used to provide additional security, so that no external source can include malicious code in the classpath when the tool runs.
-nosymlinkThis flag can be used to exclude symlinks that point to the same directory. This is not widely used. For example, /a/foo.jar and a symlink /a/bar.jar that points to /a/foo.jar would normally add foo.jar and bar.jar to the tarball as separate files despite them actually being the same file. This flag would make the tool exclude /a/bar.jar so only one copy of the file is added.
-initialReplication numThis is the replication count that the framework tarball is created with. It is safe to leave this value at the default 3. This is the tested scenario.
-finalReplication numThe uploader tool sets the replication once all blocks are collected and uploaded. If quick initial startup is required, then it is advised to set this to the commissioned node count divided by two but not more than 512.
-acceptableReplication numThe tool will wait until the tarball has been replicated this number of times before exiting. This should be a replication count less than or equal to the value in finalReplication. This is typically a 90% of the value in finalReplication to accomodate failing nodes.
-timeout secondsA timeout in seconds to wait to reach acceptableReplication before the tool exits. The tool logs an error otherwise and returns.