Application Migration
Before you migrate your applications to the Data Fabric Hadoop distribution, consider testing your applications using a small subset of data.
About this task
In this phase, you will migrate your applications to the Data Fabric cluster test environment. The goal of this phase is to get your applications running smoothly on the Data Fabric cluster using a subset of data. Once you have confirmed that all applications and components are running as expected you can begin migrating your data.
Migrating your applications from HDFS to Data Fabric is relatively easy. Data Fabric Hadoop is 100% plug-and-play compatible with Apache Hadoop, so you do not need to make changes to your applications to run them on a Data Fabric cluster.
- Data Fabric Libraries: Ensure that
your applications can find the libraries/configs it is expecting. Make sure
the java classpath includes the path to
maprfs.jar
and thejava.library.path
includeslibMapRClient.so
- Data Fabric Storage: Every
application must point to file system
(
maprfs:///
) rather than the HDFS (hdfs://
). If your application uses fs.default.name then it will work automatically. If you have hardcoded HDFS links into your applications, you must redirect those links so they point to file system. Setting a default path ofmaprfs:///
tells your applications to use the cluster specified in the first line of mapr-clusters.conf. You can also specify a specific cluster withmaprfs:///mapr/<cluster name>/
. - Permissions: The distcp command does not copy permissions; permissions defined in HDFS do not transfer automatically to file system. Data Fabric uses a combination of access control lists (ACLs) to specify cluster or volume-level permissions and file permissions to manage directory and file access. You must define these permissions in Data Fabric when you migrate your customized components, applications, and data. For more information, see Managing Permissions.
- Memory: Remove explicit memory settings defined in your applications. If memory is set explicitly in the application, the jobs may fail after migration to Data Fabric.
Generally, the best approach to migrating your applications to Data Fabric is to import a small subset of data and test and tune your application using that data in a test environment before you import your production data.
The following procedure offers a simple roadmap for migrating and running your applications in a Data Fabric cluster test environment.
Procedure
-
Copy over a small amount of data to the Data Fabric cluster. Use the hadoop distcp hftp
command to copy over a small number of files:
$ hadoop distcp hftp://namenode1:50070/foo maprfs:///bar
You must specify the namenode IP address, port number, and source directory on the HDFS cluster. For more information, see Copying Data from Apache Hadoop
- Run the application.
- Add more data and test again.
- When the application is running to your satisfaction, use the same process to test and tune another application.