Using the File System JAR to Connect to the Cluster
The file system JAR file includes the Data Fabric client libraries required to connect to the cluster. While this is strongly discouraged, application developers can bundle the file system JAR file in Data Fabric file system, HPE Ezmeral Data Fabric Database, and HPE Ezmeral Data Fabric Streams applications instead of installing the Data Fabric client on the edge node (node that runs the application). Applications should not bundle the file system JAR file unless the application meets certain requirements.
Requirements
maprfs-<version>-mapr.jar
) with applications that meet all of the
following requirements: - The application communicates directly with the file system, HPE Ezmeral Data Fabric Database, or HPE Ezmeral Data Fabric Streams
- The application does not run as a MapReduce or YARN job/application on the cluster.
- The application does not include file system JARs on the local machine in its classpath.
- The application accesses a cluster that is not secure.
Configuring the Cluster Connection
When you include the file system JAR in an application instead
of installing the Data Fabric client on the edge node, you
must create and configure a mapr-clusters.conf
file on node that runs the
application.
- Set a
MAPR_HOME
environment variable to a location such as/opt/mapr
. - Create the
mapr-clusters.conf
file in the$MAPR_HOME/conf
directory. - Configure the
mapr-clusters.conf
file with the cluster name and the list of CLDB nodes.For example, the
mapr-clusters.conf
on an edge node would contain the following content if it was connecting to a cluster namedmy.cluster
with CLDB nodes on centos765, centos234, and centos123:my.cluster secure=false centos765 centos234 centos123
For more information about how to configure
mapr-clusters.conf
, see mapr-clusters.conf.
For more information about how the Data Fabric client connects to the cluster, see How Data Fabric Clients Connect to the Cluster.
Using Maven to Include file system JAR as a Dependency
If you use Maven to bundle the file system JAR file with an application and you plan to run the application on a Data Fabric cluster where a patch has been applied, ensure that you specify both a system scope and a local system path to the file.
pom.xml
file may include the following: ...
<groupId>com.mapr.hadoop</groupId>
<artifactId>maprfs</artifactId>
<version>${mapr.core.version}</version>
<scope>system</scope>
<systemPath>/opt/mapr/lib/maprfs-5.2.0-mapr.jar</systemPath>
...
By default, the Data Fabric Maven repository includes JAR files from https://repository.mapr.com/maven/. This default Maven repository includes JAR files associated with the GA packages for each Data Fabric release. Therefore, when a patch has been applied to the cluster, failure to specify a system scope may result in errors due to a binary mismatch between the file system JAR files used by the application and the cluster.
Known Issues
- The version of the file system JAR included in the application differs from the version that is available on the cluster.
- This may occur when a patch was applied to some, but not all the nodes in the cluster. It can also occur when Maven is bundling the GA version of the JAR file when the cluster expects a newer, patched version.
- Two versions of the JAR are available on the node.
- For YARN applications, the NodeManager nodes that run the tasks or containers store local versions of the dependencies included with the application. In this scenario, since both the cluster’s file system JAR and the version included in the application are available on the node, it is unknown which JAR will be used when processing the application.