Application Migration

Before you migrate your applications to the Data Fabric Hadoop distribution, consider testing your applications using a small subset of data.

About this task

In this phase, you will migrate your applications to the Data Fabric cluster test environment. The goal of this phase is to get your applications running smoothly on the Data Fabric cluster using a subset of data. Once you have confirmed that all applications and components are running as expected you can begin migrating your data.

Migrating your applications from HDFS to Data Fabric is relatively easy. Data Fabric Hadoop is 100% plug-and-play compatible with Apache Hadoop, so you do not need to make changes to your applications to run them on a Data Fabric cluster.

Application Migration Guidelines: Keep the following considerations in mind when you migrate your applications:
  • Data Fabric Libraries: Ensure that your applications can find the libraries/configs it is expecting. Make sure the java classpath includes the path to maprfs.jar and the java.library.path includes libMapRClient.so
  • Data Fabric Storage: Every application must point to file system (maprfs:///) rather than the HDFS (hdfs://). If your application uses fs.default.name then it will work automatically. If you have hardcoded HDFS links into your applications, you must redirect those links so they point to file system. Setting a default path of maprfs:/// tells your applications to use the cluster specified in the first line of mapr-clusters.conf. You can also specify a specific cluster with maprfs:///mapr/<cluster name>/.
  • Permissions: The distcp command does not copy permissions; permissions defined in HDFS do not transfer automatically to file system. Data Fabric uses a combination of access control lists (ACLs) to specify cluster or volume-level permissions and file permissions to manage directory and file access. You must define these permissions in Data Fabric when you migrate your customized components, applications, and data. For more information, see Managing Permissions.
  • Memory: Remove explicit memory settings defined in your applications. If memory is set explicitly in the application, the jobs may fail after migration to Data Fabric.

Generally, the best approach to migrating your applications to Data Fabric is to import a small subset of data and test and tune your application using that data in a test environment before you import your production data.

The following procedure offers a simple roadmap for migrating and running your applications in a Data Fabric cluster test environment.

Procedure

  1. Copy over a small amount of data to the Data Fabric cluster. Use the hadoop distcp hftp command to copy over a small number of files:
    $ hadoop distcp hftp://namenode1:50070/foo maprfs:///bar

    You must specify the namenode IP address, port number, and source directory on the HDFS cluster. For more information, see Copying Data from Apache Hadoop

  2. Run the application.
  3. Add more data and test again.
  4. When the application is running to your satisfaction, use the same process to test and tune another application.