HomeGetting StartedInstallation & SetupDevelopment & IntegrationDeployment & OperationsData ManagementTechnical SupportPlatform Updates
DocsDeployment & OperationsExternal Computeconfiguring external compute emr serverless

EMR Serverless Configuration

This page discusses how to configure EMR Serverless as an external compute platform.

<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>

To run the Spark jobs on EMR Serverless, we need to provide specific properties in the file. This file can be passed as -c or --compute option in the CLI as described in virtual graph materialization and entity resolution sections of external compute.

Following are the mandatory properties to be present in the properties file.

Mandatory properties:

PropertyDescriptionExample
stardog.external.compute.platformSet this property to the name of the external compute platform.emr-serverless
stardog.external.aws.regionThis property contains the AWS region where EMR Studio and and the application are hosted.us-east-1
stardog.external.aws.access.keyAWS Access Key of the temporary credentials.ASIXXXXXXXXX4X
stardog.external.aws.secret.keyAWS Secret Key of the temporary credentials.kgweBpwG/CS9j1yTm0AxY7KZ04wRRYrg+3pt8rek
stardog.external.aws.session.tokenAWS Session Token of the temporary credentials.FwoGZXIvYXdzEE4aDGJNU
stardog.external.emr-serverless.application.idApplication Id of the EMR Application.00fa02fbl2qujv90
stardog.external.emr-serverless.execution.role.arnThe role that has access to emr-serverless resources. This should be the same role that is used to generate the temporary credentials.arn:aws:iam::626720997297:role/emraccess
stardog.host.urlStardog URL to which Databricks should connect back to write the results. URL should point to the same Stardog server from where external compute operation is triggered.https://myhost.stardog.cloud:5820
stardog.external.jar.pathPath of the released stardog-spark-connector jar file. This value should either point to the Stardog's public S3 bucket, where the latest released version is available or any other S3 bucket where user has placed the jar.<br> Download the latest jar from this links3://stardog-spark/stardog-spark-connector-3.3.0.jar
<br/> If a user needs to provide any spark-specific configurations, those configurations can be set in the same properties file.

Optional properties:

PropertyDescriptionDefault
spark.dataset.repartitionRefer to spark-docs. Set this value to override the default partition behavior.

A new role can be configured such that it can access emr-serverless resources and Elastic Container Registry (ECR), where a custom image is deployed. To ensure security and authentication, temporary credentials are to be created using the role and provided in the properties described above table. These credentials include the access key, secret key, and session token and can be programmatically generated using CLI or SDKs. <br>Also, the custom image for Java 11 has to be built and configured for the EMR application as mentioned in AWS documentation.