Using Amazon S3 to Store Logs

Amazon Web Services (AWS) offers Amazon Simple Storage Service (Amazon S3). Amazon S3 provides the storage and retrieval of objects through a web service interface.

Configure the Spark History Server with existing Amazon S3 storage buckets to store the event logs.

To store logs on Amazon S3 buckets,
  1. Set the following flags during Spark History Server installation. See Installing and Configuring Spark History Server.

    --set tenantIsUnsecure=true \
    --set eventlogstorage.kind=s3 \
    --set eventlogstorage.s3Endpoint=http://s3host:9000 \
    --set eventlogstorage.s3path=s3a://bucket/<path-to-folder> \
    --set eventlogstorage.s3AccessKey=<access-key \
    --set eventlogstorage.s3SecretKey=<secret-key>
    The configuration options like s3AccessKey and s3SecretKey are passed to Spark History Server using a Kubernetes secret.
    You can also securely pass the Amazon S3 credentials by setting sparkExtraConfigs option in values.yaml file.
    sparkExtraConfigs: |
      spark.hadoop.fs.s3a.access.key [access_key]
      spark.hadoop.fs.s3a.secret.key [secret_key]
  2. Set the following options in values.yaml file in a tenant namespace.
    # Space separated Java options for Spark HS (Will be added to SPARK_HISTORY_OPTS in spark-env.sh)
    HSJavaOpts: -Dcom.sun.net.ssl.checkRevocation=false -Dcom.amazonaws.sdk.disableCertChecking=true