Configuring Memory for Spark Applications

This topic describes how to set memory options for Spark applications.

You can configure the driver and executor memory options for the Spark applications by using HPE Ezmeral Runtime Enterprise new UI (see Creating Spark Applications) or by manually setting the following properties on Spark application YAML file.

  • spark.driver.memory: Amount of memory allocated for the driver.
  • spark.executor.memory: Amount of memory allocated for each executor that runs the task.
However, there is an added memory overhead of 10% of the configured driver or executor memory, but at least 384 MB. The memory overhead is per executor and driver. Thus, the total driver or executor memory includes the driver or executor memory and overhead.

Memory Overhead = 0.1 * Driver or Executor Memory (minimum of 384 MB)

Total Driver or Executor Memory = Driver or Executor Memory + Memory Overhead

Configuring Memory Overhead

You can configure the memory overhead for driver and executor by using Spark Operator, Livy, and spark-submit script.
Spark Operator
Set the following configurations options in the Spark application YAML file. See Spark application YAML.
spark.driver.memoryOverhead
spark.executor.memoryOverhead

If you are using the HPE Ezmeral Runtime Enterprise new UI, add these configuration options by clicking Edit YAML in Review step or Edit YAML from Actions menu on Spark Applications screen. See Managing Spark Applications.

Livy

Using YAML:

Add the following configuration options in spark-defaults.conf section in extraConfigs section of values.yaml file in a tenant namespace.
extraConfigs: 
 spark-defaults.conf:| 
  spark.driver.memoryOverhead <value-for-overhead> 
  spark.executor.memoryOverhead <value-for-overhead>

Using Rest APIs:

Add the following configuration options to conf section when creating a Livy session.
{
"name": "My interactive session",
"executorMemory": "512m",
"conf":
  {"spark.executor.memoryOverhead": "1g" }
}
spark-submit Script
Specify the overhead configuration options using --conf flag and dynamically load properties:
./bin/spark-submit --name "<spark-app-name>" --master <master-url> --conf spark.driver.memoryOverhead=<value>
./bin/spark-submit --name "<spark-app-name>" --master <master-url> --conf spark.executor.memoryOverhead=<value>

To learn more about driver or executor memory, memory overhead, and other properties, see Apache Spark 2.x.x and Apache Spark 3.x.x application properties.