Data Fabric MCP Server Settings

This topics describes various configuration settings required for MCP.

Data Fabric MCP Server Settings

The Spark MCP endpoints can operate in two modes, either with impersonation enabled or impersonation disabled.

Impersonation-enabled mode:

Impersonation enabled is the default mode, and is the recommended mode.

In this mode, all Spark sessions are impersonated according to the credentials of the specific user. A separate session gets created for each user, with a default idle time of 30 minutes. As each session reserves cluster resources, the number of concurrent active sessions is limited. You can configure this limit.

Before accessing Spark for the first time, you must call the create_sql_session tool. This tool must also be used to recreate expired sessions.

Impersonation-disabled mode

The impersonation-disabled mode is not a recommended mode, as it carries security risks. In this mode, all users have access to data that is available to the session-user.

In Impersonation-disabled mode, sessions are pre-created (by default, three) under the user specified as session-user in the configuration. However, there is no latency in creating sessions, and all sessions in this mode will be re-created automatically if they crash or expire.

To add the required spark parameters, you need to first create a Spark section in /opt/mapr/data-access-gateway/conf/mcp-server.conf file.

Following are the list of various Spark parameters:
NOTE
To use any Spark endpoint you must have installed mapr-livy, mapr-spark, mapr-hive, mapr-hivemetastore packages.
Table 1. MCP Spark Endpoint Options
Parameters Description
mcp.spark.enabled To enable Spark endpoint.
mcp.spark.session-use The user that will create sessions in impersonation-disabled mode
mcp.spark.pool.size Number of sessions pre-created for impersonesion-disabled mode.

Default value: 3

mcp.spark.userpool.size Number of available concurrent sessions/users for impersonation mode.

Default value: 3

mcp.spark.session.ttl ttl for Spark sessions for impersonation-disabled mode.

Default value: 10

mcp.spark.session.userttl ttl for Spark sessions for impersonation-enabled mode.

Default value: 30 minutes

mcp.spark.impersonation-enabled To enable or disable impersonation mode.

Default value: true

Table 2. Spark options (these values are passed to Spark engine):
Parameters Description
mcp.spark.driver.memory Driver memory for Spark session.

Default value: 1 GB

mcp.spark.driver.cores Number of driver cores per Spark session.

Default value: 1

mcp.spark.executor.memory Executor memory per Spark session.

Default value: 1 GB

mcp.spark.executor.cores Number of executor cores per Spark session.

Default value: 1

mcp.spark.executor.instances Number of executors per Spark session.

Default value: 1

Table 3. Iceberg options
Parameters Description
mcp.spark.iceberg.extensions Extension to use for Iceberg integration.

Default value: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

mcp.spark.iceberg.catalog iceberg catalog to use (org.apache.iceberg.spark.SparkCatalog by default)

Default value: org.apache.iceberg.spark.SparkCatalog

mcp.spark.iceberg.catalog.type Catalog type.

Default value: hive

mcp.spark.iceberg.catalog.uri Catalog URI.

Default value: thrift://localhost:9083

mcp.spark.iceberg.useBuiltIn Use built in Iceberg library.

Default value: true