Using the hdfs:// Protocol
This section describes how to copy data from an HDFS cluster to a MapR cluster using the hdfs:// protocol.
About this task
Before you can copy data from an HDFS cluster to a MapR cluster using the
hdfs://
protocol, you must configure the MapR cluster to access the
HDFS cluster. To do this, complete the steps listed in Configuring a MapR Cluster to Access an HDFS Cluster for the security
scenario that best describes your HDFS and MapR clusters and then complete the steps
listed under Verifying Access to an HDFS Cluster.
You also need the following information:
<NameNode>
: the IP address or hostname of the NameNode in the HDFS cluster<NameNode Port>
: the port for connecting to the NameNode in the HDFS cluster<HDFS path>
: the path to the HDFS directory from which you plan to copy data<MapR-FS path>
: the path in the MapR cluster to which you plan to copy HDFS data<file>
: a file in the HDFS path
To copy data from HDFS to file system using the hdfs://
protocol,
complete the following steps:
Procedure
-
Run the following hadoop command to determine if the MapR cluster can read the
contents of a file in a specified directory on the HDFS cluster:
hadoop fs -cat <NameNode>:<NameNode port>/<HDFS path>/<file>
For example:hadoop fs -cat hdfs://nn1:8020/user/sara/contents.xml
-
If the MapR cluster can read the contents of the file, run the distcp command
to copy the data from the HDFS cluster to the MapR cluster:
hadoop distcp hdfs://<NameNode>:<NameNode Port>/<HDFS path> maprfs://<MapR-FS path>
For example:
Note the required triple slashes inhadoop distcp hdfs://nn1:8020/user/sara maprfs:///user/sara
maprfs:///