Using the webhdfs:// Protocol
This section describes how to copy data from an HDFS cluster to a Data Fabric cluster using the webhdfs:// protocol.
About this task
Before you can copy data from an HDFS cluster to a Data Fabric cluster using the webhdfs://
protocol, you must configure the Data Fabric cluster
to access the HDFS cluster. To do this, complete the steps listed in Configuring a Data Fabric Cluster to Access an HDFS Cluster for the
security scenario that best describes your HDFS and Data Fabric clusters and then complete the steps listed
under Verifying Access to an HDFS Cluster.
To copy data from HDFS to file system using the webhdfs://
protocol,
complete the following steps:
Procedure
-
The HDFS cluster must have WebHDFS enabled. Verify that the following parameter
exists in the
hdfs-site.xml
file and that the value is set totrue
.<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
You also need the following information:
<NameNode>
: the IP address or hostname of the NameNode in the HDFS cluster<NameNode HTTP Port>
: the HTTP port on the NameNode in the HDFS cluster<HDFS path>
: the path to the HDFS directory from which you plan to copy data<MapR-FS path>
: the path in the Data Fabric cluster to which you plan to copy HDFS data
-
Run the following command from a node in the Data Fabric cluster to copy data from HDFS to
file system using
webhdfs://
:hadoop distcp webhdfs://<NameNode>:<NameNode HTTP Port>/<HDFS path> maprfs:///<MapR-FS path>
For example:hadoop distcp webhdfs://nn2:50070/user/sara maprfs:///user/sara
Note the required triple slashes in
maprfs:///
.