Accessing DataTaps in Kubernetes Pods

Describes the generic process for configuring Kubernetes pods to access DataTaps, including considerations and steps for Hadoop 2.x and Hadoop 3.x applications.

About this task

The hpecp-agent observes pod creation. If the pod includes the hpecp.hpe.com/dtap label, the following occurs:

  • hpecp-agent adds a sidecar container that implements the DataTaps. The hpecp-agent creates an emptyDir volume named dtap-shared-vol. This volume is mounted to the /opt/bdfs directory of the sidecar container and the application container.

  • On startup, based on the appropriate Hadoop version, the sidecar container prepares the appropriate bluedata-dtap.jar file in the /opt/bdfs directory.

  • The /opt/bdfs directory in the sidecar DataTap container and in the application container mounts from the same volume dtap-shared-vol. Thus, the application container can also directly access the bluedata-dtap.jar in the /opt/bdfs directory.

The following procedure is a generic example only.

  • KubeDirector applications included with HPE Ezmeral Runtime Enterprise are preconfigured to be able to access DataTaps, and you need only set the pod label. See Accessing DataTaps in KubeDirector Applications.
  • Spark Operator applications must be configured for DataTap access as described in Tutorial: Spark Configuration and Execution on Kubernetes.
  • If a pod has the label hpecp.hpe.com/dtap: hadoop2 or hpecp.hpe.com/dtap: hadoop3, the DataTap sidecar container runs until the pod is deleted. In some scenarios—such as when a user submits a Spark Operator application—the application container exits automatically after the application is completed. If the DataTap sidecar container still runs after the application container exits, the pod is unable to enter a completed status. Because the pod does not enter the completed state, the pod continues to use resources instead of those resources being released for use by other pods.

    To ensure that the DataTap sidecar container also exits automatically after the application container exits, use one of the following labels:

    • If the application is Hadoop 2.x, add the label:
      hpecp.hpe.com/dtap: hadoop2-job
    • If the application is Hadoop 2.x, add the label:
      hpecp.hpe.com/dtap: hadoop3-job

Procedure

  1. Add one of the following sets of labels to the YAML file of the pod:
    • If the application is Hadoop 2.x, add the following labels:
      hpecp.hpe.com/dtap: hadoop2
      hpecp.hpe.com/dtap: hadoop2-job
    • If the application is Hadoop 2.x, add the following labels:
      hpecp.hpe.com/dtap: hadoop3
      hpecp.hpe.com/dtap: hadoop3-job
  2. In the application container, add bluedata-dtap.jar to the classpath, and then modify the Hadoop core-site.xml file.

    The following example adds the fs.dtap.impl, fs.AbstractFileSystem.dtap.impl, and fs.dtap.impl.disable.cache to the core-site.xml file:

               fs.dtap.impl
               com.bluedata.hadoop.bdfs.Bdfs
        
        
               fs.AbstractFileSystem.dtap.impl
               com.bluedata.hadoop.bdfs.BdAbstractFS
        
        
               fs.dtap.impl.disable.cache
               false