Configuring Spark Applications to Write and View Logs

This section guides you through configuring your Spark Application CRs to write logs in the event directory and view the Spark Application details in Spark web UI.

Configuring Spark Applications to Write Logs

Perform the following steps to configure the Spark Application CR to write logs to PVC:
  1. Configure the volumes options under spec section of SparkApplication as follows:
    volumes:
      -name: <some-name> 
      persistentVolumeClaim:
        claimName:<same-volume-name-as-in-history-server>

    For example:

    volumes:
      -name: data
      persistentVolumeClaim:
        claimName: spark-pvc
    

    You must ensure the claimName is the same name as ExistingClaimName in values.yaml file of the Helm chart.

  2. Configure the volumeMounts option under Driver and Executor pods as follows:
    volumeMounts:
      -name:<some-name>
    mountPath: "<same-path-as-event-directory-on-history-server>" 
    
    For example:
    volumeMounts:
      -name: data
      mountPath: "/mnt/hs-logs"
    

    You must ensure the mountPath is the same path as eventsDir path in values.yaml file of the Helm chart.

  3. Configure the sparkconf options of SparkApplication for Spark Event Log Service as follows:
    "spark.eventLog.enabled": "true"
    "spark.eventLog.dir": "<same-path-as-event-directory-on-history-server>" 
    
    For example:
    "spark.eventLog.enabled": "true"
    "spark.eventLog.dir": "file:/mnt/hs-logs"
    
  4. Run the following command to submit the Spark Application:
    kubectl apply -f <path-to-example-spark-application-CRs> 

Viewing Application Details Using Web UI

You can view the application details for Completed, Failed (completed but failed), or Running Spark Applications using the Spark history web UI.

Figure 1. Spark History Server Web UI


Run the export command to get the node IP and node port to navigate to the Spark web UI.
export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "spark-hs-chart.fullname" . }}) 
export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
Access the Spark History Server web UI using the following URL:
http://<NODE_IP>:<NODE_PORT>

The default node port is 18080.

Monitor the status of all applications using the following URL:
http://<NODE_IP>:<NODE_PORT>/api/v1/applications
View the details of single application using the following URL:
http://<NODE_IP>:<NODE_PORT>/api/v1/applications/<spark-job-id>

See REST API list for Spark History Server.

NOTE There is a limitation related to Spark History Server with Amazon S3. See Spark Limitations.