HomeGetting StartedInstallation & SetupDevelopment & IntegrationDeployment & OperationsData ManagementTechnical SupportPlatform Updates
DocsDeployment & OperationsExternal Computeconfiguring external compute databricks

Databricks Configuration

This page discusses how to configure Databricks as an external compute platform.

<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>

A Databricks Data Source added in Stardog using the data-source add CLI command or Stardog Studio can be registered as an external compute platform. Add specific properties in the data source definition to configure the Databricks data source as an external compute platform.

Mandatory properties:

PropertyDescriptionExample
external.computeBoolean value for specifying whether or not data-source is registered as an external compute platform.true
external.compute.host.nameName of the Databricks workspace.adb-XXXXXXXXXXXXXX.XX.azuredatabricks.net
databricks.cluster.idDatabricks compute cluster id.0704-XXXXXX-XXXXXdir
stardog.host.urlStardog URL to which Databricks should connect back to write the results. URL should point to the same Stardog server from where external compute operation is triggered.https://myhost.stardog.cloud:5820

Optional properties:

PropertyDescriptionDefault
stardog.external.jar.pathPath of the released stardog-spark-connector jar file from where the file should transfer to the Databricks cluster.<br/><br/> By default it points to Stardog's public S3 bucket, where the latest released version is available. <br/><br/> There are two options for overriding the default path: <br/> <br/> 1) The jar can be downloaded from another s3 bucket. In this case, this should point to a custom s3 bucket path. <br/><br/> 2) The jar can reside locally on the file system where the Stardog server is running. In this case, it should point to the local file system path.<br/><br/>The Stardog server will upload the jar to the Databricks cluster if it is not present on the value specified by stardog.external.jar.upload.path property. <br/><br/> Download the latest jar from this links3://stardog-spark/stardog-spark-connector-3.3.0.jar
stardog.external.jar.upload.pathPath of the stardog-spark-connector jar file on Databricks dbfs file system.<br/><br/> Should be set both when the jar is uploaded manually by the user and when the jar is uploaded automatically by Stardog./FileStore/stardog/
stardog.external.mapping.upload.pathPath on the dbfs where Stardog and spark job will write the temporary files. E.g., in the case of Virtual Graph Materialization operation, the mapping of the virtual graph will be stored here. Stardog and spark jobs will delete these temporary files after completing the process./FileStore/stardog/
stardog.external.databricks.job.timeoutSpark job timeout property in seconds.86400
stardog.external.databricks.task.timeoutSpark task timeout property in seconds.86400
stardog.external.databricks.task.retry.countThe number of retries to attempt before the Spark job fails. Set to zero for a single attempt with no retries.3
stardog.external.databricks.task.retry.interval.millisTime interval after which Spark jobs make a retry attempt in case of an error.2000
stardog.external.databricks.is.retry.timeoutBoolean value for specifying whether the Spark job makes a retry attempt or not in case of a timeout error.false
stardog.external.databricks.job.on.start.email.listComma-separated list of emails to be notified when the Spark job stars.
stardog.external.databricks.job.on.success.email.listComma-separated list of emails to be notified when the Spark job completes.
stardog.external.databricks.job.on.failure.email.listComma-separated list of emails to be notified when the Spark job errors out.
spark.dataset.repartitionRefer to spark-docs. Set this value to override the default partition behavior.