This chapter discusses External Compute - one of Stardog’s features for pushing heavy workloads to compute platforms like Databricks and EMR Serverless. This page primarily discusses what is external compute, how it works, and what are the supported operations. See the Chapter Contents for a short description of what else is included in this chapter.
<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details> ---Stardog supports operations where the workload can be pushed to external compute platforms.
The supported compute platforms are:
Supported Operations are:
virtual-import CLI command and SPARQL Update Queries (add/copy)cache-create and cache-refresh CLI commandsentity-resolution resolve CLI commandStardog converts the supported operation into a Spark job, connects to the external compute platform, uploads the stardog-spark-connector.jar, and then
creates and triggers the Spark job. This Spark job does the required computation on the external compute platform and then connects back to the
Stardog server (using the user’s credentials that triggered the operation) to write the results to Stardog.
The Stardog Platform uploads the latest compatible stardog-spark-connector.jar version when the external compute supported operation is triggered
if this jar is not on the compute platform. For more information on various configurations available around the stardog-spark-connector.jar, please refer
to [Configuring External Compute Datasource](configuring-external-compute-databricks#Optional properties)
If the older version of stardog-spark-connector.jar is available on the compute platform, uninstall the older version. Refer to the compute platform
documentation on how to uninstall the libraries.
| Stardog Platform Versions | Compatible stardog-spark-connector Versions | Spark versions |
|---|---|---|
| 8.2.* | 2.0.0 | 3.2.2 |
| 9.0.* | 3.0.0 | 3.2.2 |
| 9.1.* | 3.1.0 | 3.2.2 |
| 10.*, 11.* | 3.2.0 | 3.5.0 |
| 11.* | 3.3.0 | 3.5.0 |
The high-level architecture for external compute is as shown:

External Compute supports OAuth for Identity Providers (IDPs) that are integrated with Stardog Launchpad, including Azure, Okta, and others. This section describes how authentication works when external compute jobs run on Databricks or EMR Serverless.
With IDP OAuth (e.g. Azure, Okta), the user is created in Stardog on first sign-in and roles assigned in the IDP are mapped to Stardog roles. When an external compute job is run, Stardog passes the triggering user’s username and a token (created for that user and their roles) to the Spark job; the job uses it to connect back to Stardog and perform the operation.
User needs to provide the issuer configuration in JWT Yaml like this:
https://${ip_address_where_stardog_is_running}:${port_number}:
usernameField: stardog-username
rolesClaimPath: stardogRoles.externalCompute
These OAuth credentials can be used to connect to Stardog and for the job to call back to Stardog. Pass-through authentication for the Virtual Graph is not supported in External Compute.