Accessing HPE Ezmeral Data Fabric File Store in Java Applications
As a high-performance file system, portions of the HPE Ezmeral Data Fabric File Store file client are based on a native maprfs library. When developing an application, specifying dependence on the JAR file that includes the maprfs library enables you to build applications without having to manage platform-specific dependencies.
The following sections describe how to access the HPE Ezmeral Data Fabric File Store in a Java program.
Writing a Java Application
In your Java application, you will use a Configuration object to interface with the file system. When you instantiate a Configuration object, it is created with values from Hadoop configuration files.
If the program is built with JAR files from the Data Fabric installation, the Hadoop 1
configuration files are in the
$MAPR_HOME/hadoop/hadoop-<version>/conf
directory, and the Hadoop
2 configuration files are in the $HADOOP_HOME/etc/hadoop
directory.
This Hadoop configuration directory is in the hadoop classpath that you include when you
compile and run the Java program.
- Sample Code
- The following sample code shows how to interface with Data Fabric file system using Java. The example
creates a directory, writes a file, then reads the contents of the
file.
/* Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved */ //package com.mapr.fs; import java.net.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.conf.*; /** * Assumes mapr installed in /opt/mapr * * Compilation: * javac -cp $(hadoop classpath) MapRTest.java * * Run: * java -cp .:$(hadoop classpath) MapRTest /test */ public class MapRTest { public static void main(String args[]) throws Exception { byte buf[] = new byte[ 65*1024]; int ac = 0; if (args.length != 1) { System.out.println("usage: MapRTest pathname"); return; } // maprfs:/// -> uses the first entry in /opt/mapr/conf/mapr-clusters.conf // maprfs:///mapr/my.cluster.com/ // /mapr/my.cluster.com/ // String uri = "maprfs:///"; String dirname = args[ac++]; Configuration conf = new Configuration(); //FileSystem fs = FileSystem.get(URI.create(uri), conf); // if wanting to use a different cluster FileSystem fs = FileSystem.get(conf); Path dirpath = new Path( dirname + "/dir"); Path wfilepath = new Path( dirname + "/file.w"); //Path rfilepath = new Path( dirname + "/file.r"); Path rfilepath = wfilepath; // try mkdir boolean res = fs.mkdirs( dirpath); if (!res) { System.out.println("mkdir failed, path: " + dirpath); return; } System.out.println( "mkdir( " + dirpath + ") went ok, now writing file"); // create wfile FSDataOutputStream ostr = fs.create( wfilepath, true, // overwrite 512, // buffersize (short) 1, // replication (long)(64*1024*1024) // chunksize ); ostr.write(buf); ostr.close(); System.out.println( "write( " + wfilepath + ") went ok"); // read rfile System.out.println( "reading file: " + rfilepath); FSDataInputStream istr = fs.open( rfilepath); int bb = istr.readInt(); istr.close(); System.out.println( "Read ok"); } }
Compiling and Running a Java Application
- Using JARs from the Maven Repository
- Maven artifacts from version 2.1.2 onward are published to https://repository.mapr.com/maven/. When compiling for Data Fabric core version 6.1, add the
following dependency to the
pom.xml
file for your project:
This dependency adds the dependencies from the mapr maven repository the next time you do a<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.0-mapr-1808</version> </dependency>
mvn clean install
. The JAR that includes themaprfs
library is a dependency for thehadoop-common
artifact. - Using JARs from the Data Fabric Installation
- The
maprfs
library is included in the hadoop classpath. Add the hadoop classpath to the JAVA classpath when you compile and run the Java application.- To compile the sample code, use the following
command:
javac -cp $(hadoop classpath) MapRTest.java
- To run the sample code, use the following
command:
java -cp .:$(hadoop classpath) MapRTest /test
- To compile the sample code, use the following
command:
Loading the Data Fabric Native Library
By default, the root class loader will load the native library to allow all children to see and access it. If the native library is loaded by a child class, other classes will not be able to access the library. To allow applications and associated child classes to access the symbols and variables in the native library, we recommend loading the native library via the root loader.
The loading of the native library via the root class loader is accomplished by injecting code into the root loader. If Data Fabric runs on top of applications (such as Tomcat) where it does not have access to the root class loader, the native library will not be loaded. Child classes that try to access the symbols under the assumption that the root class loader successfully loaded the native library will fail.
The parameter -Dmapr.library.flatclass
, when specified with Java,
disables the injection of code via the root class loader, thus disabling the loading of
the native library using the root class loader. Instead, the application trying to
access the symbols can load the native library themselves. However, since the native
library can be loaded only once and can only be seen by the application loading it,
ensure that only one application within the JVM attempts to load and access the native
library.
Garbage Collection in Data Fabric
The garbage collection (GC) algorithms in Java provide opportunities for performance optimizations for your application. Java provides the following GC algorithms:
-
Serial GC. This algorithm is typically used in client-style applications that
don't require low pause times. Specify
-XX:+UseSerialGC
to use this algorithm. -
Parallel GC, which is optimized to maximize throughput. Specify
-XX:+UseParNewGC
to use this algorithm. -
Mostly-Concurrent or Concurrent Mark-Sweep GC, which is optimized to
minimize latency. Specify
-XX:+UseConcMarkSweepGC
to use this algorithm. -
Garbage First GC, a new GC algorithm intended to replace Concurrent Mark-Sweep
GC. Specify
-XX:+UseG1GC
to use this algorithm.
- Flags for GC Debugging
- Set the following flags in Java to log the GC algorithm's behavior for later
analysis:
For more information, see the Java Garbage Collection Tuning document or the Java Garbage Collection links.-verbose:gc -Xloggc:<filename> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
Converting fid and volid
The following file system APIs are available in
com.mapr.fs.MapRFileSystem
for converting fid to file path and volid
to volume name:
public String getMountPathFidCached(String fidStr) throws IOException
public String getVolumeNameCached(int volId) throws IOException
public String getVolumeName(int volId) throws IOException
public String getMountPathFid(String fidStr) throws IOException
- Converting fid to File Path
- The
getMountPathFid(string)
andgetMountPathFidCached(string)
APIs can be used for converting file ID to the full path to the file. ThegetMountPathFid()
API makes a call to CLDB and file system to get the file path from the fid. Because this API does not cache or store this information locally, it might make repeated requests to CLDB and file system for the same fid and this might result in many RPCs to both CLDB and file system. ThegetMountPathFidCached()
API makes a call the CLDB and file system one time and stores the information locally in the shared library of the client. For subsequent calls, it uses the locally stored information to retrieve the file path from the fid. However, if there are many files in the volume, there might still be a large number of calls to CLDB and file system to determine the file path for each fid in the volume. The caching is useful if the API attempts to determine the file path for the same fid repeatedly. The cache is purged after 15 seconds. If the file name changes before the cache is purged, you will see the old name for the file until the cache expires. You can use these APIs to convert the fid to the file path.For example, the sample consumer application and the sample uncached consumer application for consuming audit logs as stream messages use these methods as shown below.
- Sample Cached Consumer
{ String token = st1.nextToken(); /* If the field has fid, expand it using Cached API */ if (token.endsWith("Fid")) { String lfidStr = st1.nextToken(); String path= null; try { path = fs.getMountPathFidCached(lfidStr); // Expand FID to path } catch (IOException e){ } lfidPath = "\"FidPath\":\""+path+"\","; // System.out.println("\nPAth for fid " + lfidStr + "is " + path); }
- Sample Uncached Consumer
{ String token = st1.nextToken(); if (token.endsWith("Fid")) { String lfidStr = st1.nextToken(); String path= null; try { path = fs.getMountPathFid(lfidStr);// Expand FID to path } catch (IOException e){ } lfidPath = "\"FidPath\":\""+path+"\","; // System.out.println("\nPAth for fid " + lfidStr + "is " + path); }
- Sample Cached Consumer
- Converting volid to Volume Name
- The
getVolumeName()
andgetVolumeNameCached()
APIs can be used for converting volume IDs to volume name. ThegetVolumeName()
API makes a call to the CLDB every time to get the volume name from the volid and this may result in too many RPCs to CLDB. ThegetVolumeNameCached()
API makes a call to the CLDB one time and stores the information locally in the shared library of the client. For subsequent calls, it uses the locally stored information to retrieve the volume name from the volid. The cache is purged after 15 seconds. You can use these APIs to convert the volid to volume name.For example, the sample consumer application and the sample uncached consumer application for consuming audit logs as stream messages uses these methods as shown below.
- Sample Cached
Consumer
if (token.endsWith("volumeId")) { String volid = st1.nextToken(); String name= null; try { int volumeId = Integer.parseInt(volid); // Cached API to convert volume Id to volume Name name = fs.getVolumeNameCached(volumeId); } catch (IOException e){ } lvolName = "\"VolumeName\":\""+name+"\","; // System.out.println("\nVolume Name for volid " + volid + "is " + name); }
- Sample Uncached
Consumer
if (token.endsWith("volumeId")) { String volid = st1.nextToken(); String name= null; try { int volumeId = Integer.parseInt(volid); // API to convert volume Id to volume Name name = fs.getVolumeName(volumeId); } catch (IOException e){ } lvolName = "\"VolumeName\":\""+name+"\","; // System.out.println("\nVolume Name for volid " + volid + "is " + name); }
- Sample Cached
Consumer