hadoop mfs
The hadoop mfs
command displays directory information and contents,
creates symbolic links and hard links, sets, gets, and removes Access Control Expressions
(ACE) on files and directories, and sets compression and chunk size on a
directory.
Syntax
hadoop mfs
[ -count <path> ]
[ -delace [-R] <path> ]
[ -getace [-R] <path> ]
[ -help <command> ]
[ -ln <target> <symlink> ]
[ -lnh <target> <hardlink> ]
[ -ls <path> ]
[ -lsd <path> ]
[ -lsf <path> ]
[ -lso <path> ]
[ -lsor <path> ]
[ -lsr <path> ]
[ -Lsr <path> ]
[ -lsrv <path> ]
[ -lss <path> ]
[ -offload <file_path> [-v] ]
[ -recall <file_path> [-v] ]
[ -rmr <path> ]
[ -setace [-R]
[-readfile <ace>] [-writefile <ace>] [-executefile <ace>]
[-addchild <ace>] [-deletechild <ace>] [-lookupdir <ace>] [-readdir <ace>]
[-aces "[rf:<ace>],[wf:<ace>],[ef:<ace>],[ac:<ace>],[dc:<ace>],[rd:<ace>],[ld:<ace>]"]
[-preservemodebits <true|false>] [-setinherit <true|false>] <path> ]
[ -setaudit on|off <dir|file|table> ]
[ -setcompression on|off|lzf|lz4|zlib <dir|table> ]
[ -setchunksize <size> <dir> ]
[ -setnetworkencryption on|off <target> ]
[ -stat <path> ]
[ -tierstatus <file_path> [-v] ]
[ -addsecuritypolicytag [-R] <comma-separated list of security policy tags> <path> ]
[ -getsecuritypolicytag [-R] <path> ]
[ -removesecuritypolicytag [-R] <comma-separated list of security policy tags> <path> ]
[ -removeallsecuritypolicytag [-R] <path> ]
[ -setsecuritypolicytag [-R] <comma-separated list of security policy tags> <path> ]
Parameters
The normal command syntax is to specify a single option from the following table, along with its corresponding arguments. If you do not set compression and chunk size for a given directory, the values are inherited from the parent directory.
Parameter |
Description |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-count <path> |
Counts and returns the number of directories and (regular, symbolic link, volume link, kvstores, and device) files in the specified path (recursively). | ||||||||||||||
-delace [-R] <path> |
Deletes all ACEs
associated with the specified file or directory and sets Access Control Expression (ACE)s for the
specified file or directory to the default value, which is the empty string. Here:
You cannot delete specific access types with this parameter.
Instead, if necessary, reset the value for the specific access type to an
empty string using the -setace parameter. If you use
an empty string to deny a specific type of access, then that type of
access is denied to all users. To deny specific types of access to
specific users only, use the negation operator ( |
||||||||||||||
-getace [-R] <path> |
Returns the permissions -- POSIX mode bits and ACEs -- for the given file or (recursively) for the directory.
Recursion is enabled only if -R is specified; if -R is not specified, this
parameter returns the permissions only for the given directory. Here:
If one or more ACEs are available for the
file or directory, a plus sign ( |
||||||||||||||
-help <command>
|
Displays help for the hadoop mfs command. |
||||||||||||||
-ln <target> <symlink>
|
Creates a symbolic link <symlink> that points to the
target path <target> , similar to the standard Linux
ln -s command. |
||||||||||||||
-lnh <target> <hardlink>
|
Creates a hardlink that associates a new name or a file path with an
existing file. You must specify the following:
|
||||||||||||||
-ls <path>
|
Lists files in the directory specified by <path> .
The hadoop mfs -ls command corresponds to the standard
hadoop fs -ls command, but provides the following
additional information:
|
||||||||||||||
-lsd <path>
|
Lists files in the directory specified by <path> ,
and also provides information about the specified directory itself:
|
||||||||||||||
-lsf <path> |
Lists just the file ID (fid) and the file name, for each file present in
the specified path. The output is not sorted. Use this option when there are millions of files in a directory. In this scenario. using this option results in the fastest listing of files, as only the fids and the file names are returned. All other file attributes are ignored. |
||||||||||||||
-lso <path> |
Lists files in the directory specified by <path> .
The hadoop mfs -lso command corresponds to the standard
hadoop fs -ls command, but provides the following
additional information:
hadoop fs -ls as it uses
an optimized printing method to dump data on screen. |
||||||||||||||
-lsor <path> |
Recursively lists files in the directory specified by
<path> . This command is the recursive variant of the
hadoop mfs -lso command. |
||||||||||||||
-lsr <path>
|
Recursively lists files in the directory and subdirectories specified by
<path> . The hadoop mfs -lsr command
corresponds to the standard hadoop fs -lsr command, but
provides the following additional information:
|
||||||||||||||
-Lsr <path>
|
Equivalent to lsr , but additionally dereferences
symbolic links |
||||||||||||||
-lsrv <path>
|
Lists all paths recursively without crossing volume links. | ||||||||||||||
-lss <path>
|
Lists files in the directory specified by <path> ,
with an additional column that displays the number of disk blocks per file.
Disk blocks are 8192 bytes. |
||||||||||||||
-offload <file_path> [-v] |
The file to offload to the storage tier. This is a blocking operation;
the control is not returned until the operation is complete and the file has
been offloaded. Use -v (for verbose) to view the status of
the ongoing offload operation. |
||||||||||||||
-recall <file_path> [-v] |
The file to recall from the storage tier. This is a blocking operation;
the control is not returned until the operation is complete and the file has
been recalled. Use -v (for verbose) to view the status of
the ongoing recall operation. |
||||||||||||||
-rmr <path> |
Recursively deletes files and directories in the specified path. This is
a highly optimized version of the normal generic hadoop fs
rmr command and is 10X faster for large directories. This option
is useful when one or more directories in the specified path contains many
(millions of) files. |
||||||||||||||
-setace [-R] [-readfile <ace>] [-writefile <ace>]
[-executefile <ace>] [-addchild <ace>] [-deletechild <ace>]
[-lookupdir <ace>] [-readdir <ace>] [-aces
"[rf:<ace>],[wf:<ace>],[ef:<ace>],[ac:<ace>],[dc:<ace>],[rd:<ace>],[ld:<ace>]"]
[-preservemodebits <true|false>] [-setinherit <true|false>]
<path>
|
Sets or modifies the read, write, and execute permissions for files or
directories. This argument will:
Specify the ACEs immediately after the
|
||||||||||||||
-setaudit on|off <dir|file|table>
|
Enables auditing of the specified directory, file, or HPE Ezmeral Data Fabric Database table. Enabling auditing of a directory does not enable auditing of files and subdirectories that exist in the directory. You must enable auditing on those existing files and subdirectories. However, any new files and subdirectories that you create will automatically be enabled for auditing. See How Does Auditing Work?. For operations on the object to be logged, auditing also needs to be enabled on the cluster and the volume in which the object is located. See Managing Auditing for details. If auditing is enabled for a directory, new files and directories created within that directory are also enabled for auditing. |
||||||||||||||
-setchunksize <size> <dir>
|
Sets the chunk size in bytes for the directory specified in
<dir> . The <size> parameter must
be a multiple of 65536. |
||||||||||||||
-setcompression on|off|lzf|lz4|zlib <dirtable>
|
Turns compression on or off on the directory specified in
<dir> or on the specified table, and sets the
compression type to one of the following if compression is not turned off:
|
||||||||||||||
-setnetworkencryption on|off <target>
|
Sets network encryption on or off for the filesystem object defined in
<target> . The cluster encrypts network target to or
from a file, directory, stream, or data-fabric table with network security
enabled. |
||||||||||||||
-stat <path> |
Displays the statistics for the given file. Only the root user and the
MAPR_USER user (user name under which data-fabric services run) have permissions
to run this command. The path is required and specifies the path (to the file) on which to run the command. The output fields for this command are as follows. |
||||||||||||||
tierstatus <file_path> [-v] |
The status of the offload or recall of the given file. If
-v (for verbose) is also specified, for the given file,
the command specifies whether data is local or offloaded as the final
output. If the file:
|
Output
When used with the -ls
, -lsd
,
-lso
, -lsor
, -lsr
, or
-lss
options, hadoop mfs
displays
information about files and directories. For each file or directory hadoop
mfs
displays a line of basic information followed by lines listing the
chunks that make up the file, in the following format:
{mode} {compression} {encryption} {audit} {diskFlush} {replication} {owner}
{group} {size} {date} {chunk size} {name} {chunk} {fid} {host} [{host}...] {chunk}
{fid} {host} [{host}...] ...
Volume links are displayed as follows:
{mode} {compression} {encryption} {audit} {diskFlush} {replication} {owner}
{group} {size} {date} {chunk size} {name} {chunk} {target volume name} {writability}
{fid} -> {fid} [{host}...]
The following table describes the values:
mode |
A text string indicating the read, write, and execute permissions for the owner, group, and other permissions. See also Managing Permissions. |
compression |
|
encryption | U: unencrypted; E: encrypted |
audit | U: disabled; A: enabled |
disk flush | U:disabled; F:enabled |
replication |
The replication factor of the file (directories display a dash instead) |
owner |
The owner of the file or directory |
group |
The group of the file of directory |
size |
The size of the file or directory |
date |
The date the file or directory was last modified |
chunk size |
The chunk size of the file or directory |
name |
The name of the file or directory |
chunk |
The chunk number. The first chunk is a primary chunk labeled
" |
fid |
The chunk's file ID, which consists of three parts:
For volume links, the first |
host |
The host on which the chunk resides. When several hosts are listed, the first host is the first copy of the chunk, while subsequent hosts are replicas. |
target volume name |
The name of the volume pointed to by a volume link. |
writability |
Displays whether the volume is writable. |
When used with the -lsf <path>
option,
hadoop mfs
displays only the file ID (fid) and the file name of each
file in the path.
When used with the -stat
<path>
option, hadoop mfs
displays statistics for the
given file. For each file, it displays the following:
Output field | Description |
---|---|
uid |
The user ID of the owner. |
|
The last access time. The |
mtime |
The last modified time. |
nlink |
The number of hard links. |
type |
The type of the file. Value can be one of:
|
size |
The size of the file or directory. Depending on the type of file, it can be the actual size or the number of entries. |
mode |
The UNIX style permission mode bits for the file/directory. |
networkencryption |
The network encryption setting. Determines whether network encryption is enabled for this file. |
subtype |
The subtype for the specified type. The following subtypes are supported
for some of the types:
For all other types, subtypes are not valid. |
gid |
The group ID. |
compression |
The compression setting. |
tierstatus
,
the output varies based on whether or not data is local, was offloaded, or was recalled.
The output looks similar to the following if:- Data was completely offloaded:
File does not have local data
- Data could not be completely offloaded or data was recalled:
File has local data
Examples
View File Information
The hadoop mfs
command is used to view file contents. You can use this
command to check if compression is turned off in a directory or mounted volume. For
example,
# hadoop mfs -ls /
Found 121 items
vrwxr-xr-x Z E U U 3 mapr mapr 121 2018-08-10 01:07 268435456 /.rw
p mapr.cluster.root writeable 2049.50.131362 -> 2049.16.2 physical19.qa.lab:5660 physical20.qa.lab:5660 physical23.qa.lab:5660
vrwxr-xr-x Z E U U 3 root root 1 2018-08-09 19:26 268435456 /ATS-VOL1533867958
p ATS-VOL1533867958 default 2049.138.131538 -> 2322.16.2 physical20.qa.lab:5660 physical19.qa.lab:5660 physical22.qa.lab:5660
vrwxr-xr-x Z E U U 3 root root 1 2018-08-09 21:31 268435456 /ATS-VOL1533875473
p ATS-VOL1533875473 default 2049.190.131642 -> 2685.16.2 physical21.qa.lab:5660 physical27.qa.lab:5660 physical23.qa.lab:5660
drwxr-xr-x Z E U U - root root 1 2018-08-09 18:15 268435456 /ATS-VOLUME-1533863729955
p 2049.102.131466 physical19.qa.lab:5660 physical20.qa.lab:5660 physical23.qa.lab:5660
...
In the preceding example, the letter Z
indicates LZ4 compression on the
directory; the letter U
indicates that the directory is uncompressed.
In the following example, the listed item is both uncompressed (first
U
) and unencrypted (second U
).
[root@node1-302 ~]# hadoop mfs -ls /hbase
Found 10 items
drwxr-xr-x Z E U U - root root 1 2018-08-09 19:26 268435456 /ATS-VOL1533867958/data1533867963
p 2322.32.131374 physical20.qa.lab:5660 physical19.qa.lab:5660 physical22.qa.lab:5660
...
The following example demonstrates the usage of the -lsf
option:
[root@vm5 logs]# hadoop mfs -lsf /tmp/
2050.33.262504 /tmp/hosts1
2050.32.262502 /tmp/hosts2
2050.35.393704 /tmp/hosts3
Set ACEs
Example 1: The following command shows how to set separate read, write, and execute permissions (using ACE) on a file:
hadoop mfs -setace -readfile p -writefile 'g:group1&!u:user1' -executefile p /file
- Read access is set for owner, owning group, and others.
- Write access is set for none.
- Execute access is set for owner, owning group, others.
hadoop mfs -setace -aces "rf:u:root,wf:group1&!user1,ef:p,rd:u:m7user1" -setinherit true /dir
- Read access is set to owner/user.
- Write access is set to none.
- Execute access set for others.
hadoop mfs -setace -R -aces "rf:p,wf:g:group1&!u:user1,ef:p" -preservemodebits true /dir
When the command shown above runs, the POSIX mode bits are not modified to match the ACE setting.
writefile
, which was set in the first example above, without
removing all other access types associated with the file. The empty string used in the
following example will deny write access to all users, roles, and groups.
hadoop mfs -setace -writefile "" -preservemodebits false /file
When the command shown above runs, the POSIX mode bit for writing to the file is set to 0.
Get ACEs
hadoop mfs -getace /m7user1/file1.txt
Output
Path: /m7user1/file1.txt
readfile: !u:m7user1
writefile: !u:m7user1
executefile: !u:m7user1
mode: ---------
Delete ACEs
hadoop mfs -delace /file
hadoop mfs -delace /dir
hadoop mfs -delace -R /dir
Create a Hard Link to File
# hadoop mfs -lnh /madvol1/file1 /madvol1/file2
Creating Hardlink: /madvol1/file2 -> /madvol1/file1
Retrieve the Number of Hard Links
# hadoop mfs -stat /vol1/file1
Path: /vol1/file1
fid: 23185.32.131232
uid: root
gid: root
atime: 2016-06-29 18:49:03
mtime: 2016-07-01 18:01:54
nlink: 2
type: FTRegular
subtype: FSTInval
size: 1024000000
blocksize: 268435456
mode: 644
networkencryption: false
compression: off
# hadoop mfs -tierstatus /vol1/file2
File has local data.
# hadoop mfs -tierstatus /vol1/test1 -v
FID Has Local Data
2154.109.1049824 Yes
2172.143.524906 Yes
2172.153.524926 Yes
2172.166.524952 Yes
2172.167.524954 Yes
File has local data.
Tag a file with a security policy:
The following command tags the file
/user/root/javax.servlet-3.0.jar
with three security policies,
namely pci
, hippa
, and new
hadoop mfs -setsecuritypolicytag pci,hippa,new /user/root/javax.servlet-3.0.jar
Retrieve security policy tags from a file:
hadoop mfs -getsecuritypolicytag /user/root/javax.servlet-3.0.jar
[hippa, new, pci]
Remove all security policy tags from a file:
hadoop mfs -removeallsecuritypolicytag /user/root/javax.servlet-3.0.jar