Using ADLS for Data Input or Output
You can use Azure Data Lake Store (ADLS) as a source or destination for your application data.
Prerequisites
For general information about the features of ADLS, refer to the Azure Data Lake Store documentation.
For information about configuring ADLS as storage for a Hadoop cluster, refer to the official Apache documentation.
The Azure Data Lake Storage access path syntax is:adl://<Account Name>.azuredatalakestore.net/
You can use ADLS
the same way as you use file system, substituting an
adl
scheme instead of maprfs
,
hdfs
, webhdfs
, and so on. Procedure
-
Create a directory and read data:
[mapr@node4 ~]$ hadoop fs -mkdir adl://<username>.azuredatalakestore.net/testdir [mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/ Found 1 items drwxr-xr-x - 9d3f4f74-8337-4dae-ad77-f63459438553 331c9f66-6875-4e13-a74f-458dd23e4bde 0 2018-04-16 09:09 adl://<username>.azuredatalakestore.net/testdir
-
Put data into ADLS from your local file system:
[mapr@node4 ~]$ hadoop fs -put testfile adl://<username>.azuredatalakestore.net/testdir [mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/testdir Found 1 itemsrw-rr- 1 9d3f4f74-8337-4dae-ad77-f63459438553 331c9f66-6875-4e13-a74f-458dd23e4bde 0 2018-04-16 09:10 adl://<username>.azuredatalakestore.net/testdir/testfile
-
Delete data from ADLS:
[mapr@node4 ~]$ hadoop fs -rm -r adl://<username>.azuredatalakestore.net/testdir [mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/
-
Run YARN jobs with your input and output stored in ADLS:
yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0-mapr-1710-SNAPSHOT.jar wordcount adl://<username>.azuredatalakestore.net/testdir/testfile adl://<username>.azuredatalakestore.net/wordcountout