Planning and Initial Deployment
There are a number of considerations to take into account before migrating from Apache Hadoop to Data Fabric Hadoop.
The first phase of migration is planning. In this phase you will identify the requirements and goals of the migration, identify potential issues in the migration, and define a strategy.
The requirements and goals of the migration depend on a number of factors:
- Data migration: can you move your datasets individually, or must the data be moved all at once?
- Downtime: can you tolerate downtime, or is it important to complete the migration with no interruption in service?
- Customization: what custom patches or applications are running on the cluster?
- Storage: is there enough space to store the data during the migration?
The Data Fabric Hadoop distribution is 100% plug-and-play compatible with Apache Hadoop, so you do not need to make changes to your applications to run them on a Data Fabric cluster. Data Fabric Hadoop automatically configures compression and memory settings, task heap sizes, and local volumes for shuffle data.
Initial Deployment
The initial Data Fabric deployment phase consists of installing, configuring, and testing the Data Fabric cluster and any ecosystem components (such as Hive or Pig) on an initial set of nodes. Once you have the Data Fabric cluster deployed, you will be able to begin migrating data and applications.
To deploy the Data Fabric cluster on the selected nodes, see the Installing Core and Ecosystem Components