Configuring Spark Applications to Write and View Logs

This section guides you through configuring your Spark Application CRs to write logs in the event directory and view the Spark Application details in Spark web UI.

Configuring Spark Applications to Write Logs

Perform the following steps to configure the Spark Application CR to write logs to PVC:

Configure the volumes options under spec section of SparkApplication as follows:
```
volumes:
  -name: <some-name> 
  persistentVolumeClaim:
    claimName:<same-volume-name-as-in-history-server>
```
For example:
```
volumes:
  -name: data
  persistentVolumeClaim:
    claimName: spark-pvc
```
You must ensure the claimName is the same name as ExistingClaimName in values.yaml file of the Helm chart.
Configure the volumeMounts option under Driver and Executor pods as follows:
```
volumeMounts:
  -name:<some-name>
mountPath: "<same-path-as-event-directory-on-history-server>" 
```
For example:
```
volumeMounts:
  -name: data
  mountPath: "/mnt/hs-logs"
```
You must ensure the mountPath is the same path as eventsDir path in values.yaml file of the Helm chart.

Configure the sparkconf options of SparkApplication for Spark Event Log Service as follows:

"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "<same-path-as-event-directory-on-history-server>"

For example:

"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/mnt/hs-logs"

Run the following command to submit the Spark Application:
```
kubectl apply -f <path-to-example-spark-application-CRs> 
```

Viewing Application Details Using Web UI

You can view the application details for Completed, Failed (completed but failed), or Running Spark Applications using the Spark history web UI.

Run the export command to get the node IP and node port to navigate to the Spark web UI.

export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "spark-hs-chart.fullname" . }})

export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")

echo http://$NODE_IP:$NODE_PORT

Access the Spark History Server web UI using the following URL:

http://<NODE_IP>:<NODE_PORT>

The default node port is 18080.

Monitor the status of all applications using the following URL:

http://<NODE_IP>:<NODE_PORT>/api/v1/applications

View the details of single application using the following URL:

http://<NODE_IP>:<NODE_PORT>/api/v1/applications/<spark-job-id>

See REST API list for Spark History Server.

NOTE

There is a limitation related to Spark History Server with Amazon S3. See Spark Limitations.

HPE Ezmeral Runtime Enterprise 5.6 Documentation
Abstract	HPE Ezmeral Container Platform is a unified container platform built on open source Kubernetes and designed for both cloud-native applications and non-cloud-native applications running on any infrastructure either on-premises, in multiple public clouds, in a hybrid model, or at the edge.
Published	July 2024
Edition	5.6.0