Hadoop命令指南

概述

所有Hadoop命令和子项目都遵循相同的基本结构:

用法:shellcommand [SHELL_OPTIONS] [命令] [通用选项] [命令选项]

字段 描述
shellcommand The command of the project being invoked. For example, Hadoop common uses hadoop, HDFS uses hdfs, and YARN uses yarn.
SHELL_OPTIONS 在执行Java之前由shell处理的选项。
COMMAND 要执行的操作。
GENERIC_OPTIONS 多个命令支持的通用选项集合。
COMMAND_OPTIONS 本文档描述了Hadoop通用子项目的各种命令及其选项。HDFS和YARN在其他文档中有详细说明。

Shell选项

所有的shell命令都接受一组通用选项。对于某些命令,这些选项会被忽略。例如,在仅针对单个主机执行的命令上传递---hostnames选项将被忽略。

SHELL_OPTION 描述
--buildpaths Enables developer versions of jars.
--config confdir Overwrites the default Configuration directory. Default is $HADOOP_HOME/etc/hadoop.
--daemon mode If the command supports daemonization (e.g., hdfs namenode), execute in the appropriate mode. Supported modes are start to start the process in daemon mode, stop to stop the process, and status to determine the active status of the process. status will return an LSB-compliant result code. If no option is provided, commands that support daemonization will run in the foreground. For commands that do not support daemonization, this option is ignored.
--debug Enables shell level configuration debugging information
--help Shell script usage information.
--hostnames When --workers is used, override the workers file with a space delimited list of hostnames where to execute a multi-host subcommand. If --workers is not used, this option is ignored.
--hosts When --workers is used, override the workers file with another file that contains a list of hostnames where to execute a multi-host subcommand. If --workers is not used, this option is ignored.
--loglevel loglevel Overrides the log level. Valid log levels are FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. Default is INFO.
--workers If possible, execute this command on all hosts in the workers file.

通用选项

许多子命令支持一组通用的配置选项来改变它们的行为:

GENERIC_OPTION 描述
-archives <comma separated list of archives> Specify comma separated archives to be unarchived on the compute machines. Applies only to job.
-conf <configuration file> Specify an application configuration file.
-D <property>=<value> Use value for given property.
-files <comma separated list of files> Specify comma separated files to be copied to the map reduce cluster. Applies only to job.
-fs <file:///> or <hdfs://namenode:port> Specify default filesystem URL to use. Overrides ‘fs.defaultFS’ property from configurations.
-jt <local> or <resourcemanager:port> Specify a ResourceManager. Applies only to job.
-libjars <comma separated list of jars> Specify comma separated jar files to include in the classpath. Applies only to job.

Hadoop常用命令

所有这些命令都通过hadoop shell命令行执行。它们被分为用户命令管理命令两大类。

用户命令

对hadoop集群用户有用的命令。

archive

创建一个hadoop归档文件。更多信息请参阅Hadoop归档指南

checknative

用法:hadoop checknative [-a] [-h]

COMMAND_OPTION 描述
-a Check all libraries are available.
-h print help

该命令用于检查Hadoop原生代码的可用性。有关更多信息,请参阅原生库。默认情况下,此命令仅检查libhadoop的可用性。

classpath

用法:hadoop classpath [--glob |--jar |-h |--help]

COMMAND_OPTION 描述
--glob expand wildcards
--jar path write classpath as manifest in jar named path
-h, --help print help

打印获取Hadoop jar包及所需依赖库所需的类路径。如果调用时不带参数,则打印由命令脚本设置的类路径,其中类路径条目可能包含通配符。其他选项会在通配符展开后打印类路径,或将类路径写入jar文件的清单中。后者适用于无法使用通配符且展开后的类路径超过支持的最大命令行长度的环境。

conftest

用法:hadoop conftest [-conffile ]...

COMMAND_OPTION 描述
-conffile Path of a configuration file or directory to validate
-h, --help print help

验证配置XML文件。如果未指定-conffile选项,则会验证${HADOOP_CONF_DIR}目录下所有以.xml结尾的文件。如果指定了该选项,则会验证指定路径。您可以指定文件或目录,如果指定的是目录,则会验证该目录下所有以.xml结尾的文件。-conffile选项可以多次指定。

验证相当简单:XML会被解析,并检查是否有重复或空的属性名称。该命令不支持XInclude;如果您使用它来引入配置项,系统将声明该XML文件无效。

credential

用法:hadoop credential <subcommand> [options]

COMMAND_OPTION 描述
create alias [-provider provider-path] [-strict] [-value credential-value] Prompts the user for a credential to be stored as the given alias. The hadoop.security.credential.provider.path within the core-site.xml file will be used unless a -provider is indicated. The -strict flag will cause the command to fail if the provider uses a default password. Use -value flag to supply the credential value (a.k.a. the alias password) instead of being prompted.
delete alias [-provider provider-path] [-strict] [-f] Deletes the credential with the provided alias. The hadoop.security.credential.provider.path within the core-site.xml file will be used unless a -provider is indicated. The -strict flag will cause the command to fail if the provider uses a default password. The command asks for confirmation unless -f is specified
list [-provider provider-path] [-strict] Lists all of the credential aliases The hadoop.security.credential.provider.path within the core-site.xml file will be used unless a -provider is indicated. The -strict flag will cause the command to fail if the provider uses a default password.
check alias [-provider provider-path] [-strict] Check the password for the given alias. The hadoop.security.credential.provider.path within the core-site.xml file will be used unless a -provider is indicated. The -strict flag will cause the command to fail if the provider uses a default password.

用于管理凭证提供程序中的凭据、密码和机密的命令。

Hadoop中的CredentialProvider API实现了应用程序与其所需密码/密钥存储方式的分离。为了指定特定的提供程序类型和位置,用户必须在core-site.xml中配置hadoop.security.credential.provider.path元素,或在以下每个命令中使用-provider命令行选项。该提供程序路径是一个以逗号分隔的URL列表,用于指定应被查询的提供程序类型和位置。例如以下路径:user:///,jceks://file/tmp/test.jceks,jceks://hdfs@nn1.example.com/my/path/test.jceks

表示应通过用户提供程序查阅当前用户的凭据文件,位于/tmp/test.jceks的本地文件是Java密钥库提供程序,而位于HDFS中nn1.example.com/my/path/test.jceks的文件也是Java密钥库提供程序的存储位置。

使用凭证命令时,通常是为了向特定凭证存储提供者配置密码或密钥。为了明确指定使用哪个提供者存储,应使用-provider选项。否则,在给定多个提供者路径的情况下,将使用第一个非临时提供者。这可能并非您预期的提供者。

供应商通常要求提供密码或其他密钥。如果供应商需要密码但未能找到,它将使用默认密码并发出警告消息,提示正在使用默认密码。如果提供了-strict标志,警告消息将变为错误消息,并且命令会立即返回错误状态。

示例:hadoop credential list -provider jceks://file/tmp/test.jceks

distch

用法:hadoop distch [-f urilist_url] [-i] [-log logdir] path:owner:group:permissions

COMMAND_OPTION 描述
-f List of objects to change
-i Ignore failures
-log Directory to log output

一次性更改多个文件的所有权和权限。

distcp

递归复制文件或目录。更多信息请参阅Hadoop DistCp指南

dtutil

用法:hadoop dtutil [-keytab 密钥表文件 -principal 主体名称 ] 子命令 [-format (java|protobuf)] [-alias 别名 ] [-renewer 续订者 ] 文件名...

用于获取和管理凭证文件中hadoop委托令牌的实用工具。它旨在取代更简单的命令fetchdt。包含多个子命令,每个子命令都有各自的标志和选项。

对于每个输出文件的子命令,-format选项将指定要使用的内部格式。java是匹配fetchdt的旧格式。默认格式为protobuf

对于每个连接到服务的子命令,都提供了便捷标志来指定用于身份验证的Kerberos主体名称和密钥表文件。

子命令 描述
print
   [-alias alias ]
   filename [ filename2 ...]
Print out the fields in the tokens contained in filename (and filename2 …).
If alias is specified, print only tokens matching alias. Otherwise, print all tokens.
get URL
   [-service scheme ]
   [-format (java|protobuf)]
   [-alias alias ]
   [-renewer renewer ]
   filename
Fetch a token from service at URL and place it in filename.
URL is required and must immediately follow get.
URL is the service URL, e.g. hdfs://localhost:9000.
alias will overwrite the service field in the token.
It is intended for hosts that have external and internal names, e.g. firewall.com:14000.
filename should come last and is the name of the token file.
It will be created if it does not exist. Otherwise, token(s) are added to existing file.
The -service flag should only be used with a URL which starts with http or https.
The following are equivalent: hdfs://localhost:9000/ vs. http://localhost:9000 -service hdfs
append
   [-format (java|protobuf)]
   filename filename2 [ filename3 ...]
Append the contents of the first N filenames onto the last filename.
When tokens with common service fields are present in multiple files, earlier files’ tokens are overwritten.
That is, tokens present in the last file are always preserved.
remove -alias alias
   [-format (java|protobuf)]
   filename [ filename2 ...]
From each file specified, remove the tokens matching alias and write out each file using specified format.
alias must be specified.
cancel -alias alias
   [-format (java|protobuf)]
   filename [ filename2 ...]
Just like remove, except the tokens are also cancelled using the service specified in the token object.
alias must be specified.
renew -alias alias
   [-format (java|protobuf)]
   filename [ filename2 ...]
For each file specified, renew the tokens matching alias and write out each file using specified format.
alias must be specified.
import base64
   [-alias alias ]
   filename
Import a token from a base64 token.
alias will overwrite the service field in the token.

fs

该命令在文件系统Shell指南中有详细说明。当使用HDFS时,它是hdfs dfs的同义词。

gridmix

Gridmix是一个用于Hadoop集群的基准测试工具。更多信息请参阅Gridmix指南

jar

用法:hadoop jar [mainClass] args...

运行一个jar文件。

请改用yarn jar来启动YARN应用程序。

jnipath

用法:hadoop jnipath

打印计算出的java.library.path。

kerbname

用法:hadoop kerbname principal

通过auth_to_local规则将指定的主体转换为Hadoop用户名。

示例:hadoop kerbname user@EXAMPLE.COM

kdiag

用法:hadoop kdiag

诊断Kerberos问题

key

用法:hadoop key [options]

COMMAND_OPTION 描述
create keyname [-cipher cipher] [-size size] [-description description] [-attr attribute=value] [-provider provider] [-strict] [-help] Creates a new key for the name specified by the keyname argument within the provider specified by the -provider argument. The -strict flag will cause the command to fail if the provider uses a default password. You may specify a cipher with the -cipher argument. The default cipher is currently “AES/CTR/NoPadding”. The default keysize is 128. You may specify the requested key length using the -size argument. Arbitrary attribute=value style attributes may be specified using the -attr argument. -attr may be specified multiple times, once per attribute.
roll keyname [-provider provider] [-strict] [-help] Creates a new version for the specified key within the provider indicated using the -provider argument. The -strict flag will cause the command to fail if the provider uses a default password.
delete keyname [-provider provider] [-strict] [-f] [-help] Deletes all versions of the key specified by the keyname argument from within the provider specified by -provider. The -strict flag will cause the command to fail if the provider uses a default password. The command asks for user confirmation unless -f is specified.
list [-provider provider] [-strict] [-metadata] [-help] Displays the keynames contained within a particular provider as configured in core-site.xml or specified with the -provider argument. The -strict flag will cause the command to fail if the provider uses a default password. -metadata displays the metadata.
check keyname [-provider provider] [-strict] [-help] Check password of the keyname contained within a particular provider as configured in core-site.xml or specified with the -provider argument. The -strict flag will cause the command to fail if the provider uses a default password.

| -help | 打印此命令的用法 |

通过KeyProvider管理密钥。有关KeyProviders的详细信息,请参阅透明加密指南

供应商通常要求提供密码或其他密钥。如果供应商需要密码但未能找到,它将使用默认密码并发出警告消息,提示正在使用默认密码。如果提供了-strict标志,警告消息将变为错误消息,并且命令会立即返回错误状态。

注意:某些密钥提供者(例如org.apache.hadoop.crypto.key.JavaKeyStoreProvider)不支持大写的密钥名称。

注意:某些KeyProviders不会直接执行密钥删除操作(例如执行软删除而非实际删除,或延迟实际删除以防止误操作)。在这些情况下,删除同名密钥后立即创建/删除同名密钥时可能会遇到错误。详情请查阅底层KeyProvider的说明文档。

kms

用法:hadoop kms

运行KMS,即密钥管理服务器。

version

用法:hadoop version

打印版本信息。

CLASSNAME

用法:hadoop CLASSNAME

运行名为CLASSNAME的类。该类必须是某个包的一部分。

envvars

用法:hadoop envvars

显示计算出的Hadoop环境变量。

管理命令

对hadoop集群管理员有用的命令。

daemonlog

用法:

hadoop daemonlog -getlevel <host:port> <classname> [-protocol (http|https)]
hadoop daemonlog -setlevel <host:port> <classname> <level> [-protocol (http|https)]
COMMAND_OPTION 描述
-getlevel host:port classname [-protocol (http|https)] Prints the log level of the log identified by a qualified classname, in the daemon running at host:port. The -protocol flag specifies the protocol for connection.
-setlevel host:port classname level [-protocol (http|https)] Sets the log level of the log identified by a qualified classname, in the daemon running at host:port. The -protocol flag specifies the protocol for connection.

动态获取/设置守护进程中由限定类名标识的日志的日志级别。默认情况下,该命令发送HTTP请求,但可以通过使用参数-protocol https来发送HTTPS请求以覆盖此行为。

示例:

$ bin/hadoop daemonlog -setlevel 127.0.0.1:9870 org.apache.hadoop.hdfs.server.namenode.NameNode DEBUG
$ bin/hadoop daemonlog -getlevel 127.0.0.1:9871 org.apache.hadoop.hdfs.server.namenode.NameNode -protocol https

请注意,该设置不是永久性的,在守护进程重启时会被重置。此命令通过向守护进程内部的Jetty servlet发送HTTP/HTTPS请求来工作,因此支持以下守护进程:

  • Common
    • 密钥管理服务器
  • HDFS
    • 名称节点
    • 辅助名称节点
    • 数据节点
    • 日志节点
    • HttpFS服务器
  • YARN
    • 资源管理器
    • 节点管理器
    • 时间轴服务器

文件

etc/hadoop/hadoop-env.sh

该文件存储所有Hadoop shell命令使用的全局设置。

etc/hadoop/hadoop-user-functions.sh

该文件允许高级用户覆盖某些shell功能。

~/.hadooprc

这里存储单个用户的个人环境配置。它在hadoop-env.sh和hadoop-user-functions.sh文件之后被处理,可以包含相同的设置。