HomeGetting StartedInstallation & SetupDevelopment & IntegrationDeployment & OperationsData ManagementTechnical SupportPlatform Updates
DocsData ManagementAnalytics & MLexternal catalogs

External Catalogs

This page discusses importing and using metadata from external catalog systems.

<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>

Overview

The Knowledge Catalog can import metadata from external catalog systems to enable a single unified semantic layer over multiple catalogs.

External Credentials

Metadata providers that call external REST APIs cannot use Data Sources to hold credentials. The Catalog provides CLI commands to store encrypted usernames and passwords in the catalog credential store. A token is returned that is used in provider configurations. The token is exchanged for the stored credentials during server processing.

The catalog credential store is a user configurable location for storing credential strings. The stored strings can be configured to be stored as keyless or encryption key hashes.

The following options can be used to override the defaults.

OptionDescriptionValueDefault
catalog.key.storeThe filepath of the key store. Ignored for database key store.file path$STARDOG_HOME
catalog.key.typeThe type of hashing to use.RSA or AES or XORRSA
catalog.key.passwordPassword for an AES key.string
catalog.credential.storeThe type of key storage to use.database or filepathdatabase

By default the credential store expectes a DER encoded RSA key pair located in the Stardog home directory. The keys need to be named catalog.priv and catalog.pub.

The following example uses openssl to generate an RSA key pair:

  • Create an RSA PEM encoded file
openssl genrsa -out catalog.pem 2048
  • Extract the private key as a DER encoded file
openssl pkcs8 -topk8 -nocrypt -in catalog.pem -outform der -out catalog.priv
  • Extract the public key as a DER encoded file
openssl pkey -in catalog.pem -pubout -outform der -out catalog.pub

To add credentials to the store use the CLI administrator commands catalog credentials-add and catalog credentials-list.

Adding a credential

In the following example a username my_user_name is added along with the corresponding password. The returned UUID 04bee4c7-26cc-4c97-b817-dbb3298fa842 is the value that is the used in the access_key property for the metadata provider using these credentials.

$ stardog-admin catalog credentials-add
Username: my_user_name
Password:
Description (optional): My External System Credentials

04bee4c7-26cc-4c97-b817-dbb3298fa842

Listing existing stored credentials

In the following example the account credentials that were added are now listed. Only the UUID and the user provided description are retrieved. There is no way to retrieve the original credential values. The credential values can only be retrived by the catalog server once they are stored.

$ stardog-admin catalog credentials-list

Catalog Stored Credentials
+--------------------------------------+--------------------------------+
|              Access Key              |          Description           |
+--------------------------------------+--------------------------------+
| 04bee4c7-26cc-4c97-b817-dbb3298fa842 | My External System Credentials |
+--------------------------------------+--------------------------------+

Databricks Unity Catalog

The Knowledge Catalog can be configured to import Unity Catalog metadata using a Databricks account. You can configure the import to occur on a customizable schedule. Databricks Unity metadata is written to the Stardog Catalog, where it can be queried in conjunction with your Stardog databases.

Configuration

To import Databricks Unity Catalog metadata, you insert a DatabricksProvider configuration into the Knowledge Catalog's stardog:catalog:providers named graph. The configuration describes how Databricks can be accessed and how often the Knowledge Catalog should refresh the metadata.

insert data {
    graph stardog:catalog:providers
    {
        <urn:myDBricksProvider> a <tag:stardog:api:catalog:DatabricksProvider> ;
            <tag:stardog:api:catalog:provider:dataSource> <DATA_SOURCE_HERE> ;
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
}

This table details the property values that need to be set for configuring a Databricks metadata provider.

PropertyDescriptionValues
rdf:typeDatabricks metadata provider classtag:stardog:api:catalog:DatabricksProvider
tag:stardog:api:catalog:provider:dataSourceDatasource to use for connecting to a Databricks accountThe IRI of an existing Data Source, e.g. data-source://myDatasource
tag:stardog:api:catalog:provider:scheduleFrequency of metadata importsQuartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

After the configuration is inserted, a job is automatically created to run on the specified schedule. The job will import Databricks Unity metadata and load a general data model for viewing the metadata in Explorer.

Data Model

This table contains the classes used for modeling the Databricks metadata. Prefix bricks is namespace tag:stardog:api:catalog:databricks:.

ClassPropertyDescription
bricks:DatabricksThe metadata from an external Databricks platform
bricks:DatabricksCatalogA Databricks catalog
bricks:ownerThe owner account
bricks:catalogTypeThe catalog type
bricks:DatabricksSchemaA Databricks schema
bricks:ownerThe owner account
bricks:fullNameThe full name of a schema
bricks:DatabricksTableA Databricks table
bricks:tableTypeThe table type
bricks:fullNameThe full name of the table
bricks:dataSourceFormatThe data source format
bricks:ownerThe owner account
bricks:DatabricksColumnA Databricks column
bricks:positionThe column position
bricks:precisionThe column precision
bricks:nullableIf the column is nullable
bricks:dataTypeThe column data type
bricks:scaleThe column scale

Collibra

The Knowledge Catalog can be configured to import data from a Collibra Data Intelligence Cloud account. Collibra is a data catalog product that collects business glossary, data governance, lineage and compliance metadata.

Configuration

The configuration for a Collibra provider requires that Collibra credentials be stored in the catalog credential store. See storing credentials for details.

Collibra Data Intelligence Cloud uses HTTP Basic Auth with a username and password for authentication. Add your account username and password to the catalog credential store and use the returned access key to configure the Collibra provider.

insert data {
    graph stardog:catalog:providers
    {
        <urn:collibra> a <tag:stardog:api:catalog:CollibraProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
}

This table details the property values that need to be set for configuring a Collibra metadata provider.

PropertyDescriptionValues
rdf:typeCollibra metadata provider classtag:stardog:api:catalog:CollibraProvider
tag:stardog:api:catalog:provider:accessKeyAccess key from credential storeUUID
tag:stardog:api:catalog:provider:serverAddressCloud URL for Collibra accountURL
tag:stardog:api:catalog:provider:scheduleFrequency of metadata importsQuartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

Data Model

This table contains the classes used for modeling the Collibra metadata. Prefix collibra is namespace tag:stardog:api:catalog:collibra:.

ClassPropertyDescription
collibra:CollibraThe metadata from an external Collibra platform
collibra:communityA Collibra community
collibra:AssetA Collibra asset
collibra:idAn asset ID
collibra:nameAn asset name
collibra:domainAn asset domain
collibra:assetTypeAn asset type
collibra:tagAn asset tag
collibra:collibraUrlURL to Collibra asset page
collibra:AssetTypeAn asset type
collibra:childOfA parent asset type
collibra:DomainA Collibra domain
collibra:idA domain ID
collibra:nameA domain name
collibra:communityA domain community
collibra:domainTypeA domain type
collibra:CommunityA Collibra community
collibra:idA community ID
collibra:communityA parent community
collibra:nameA community name
collibra:RelationA Collibra relation
collibra:idA relation ID
collibra:relationTypeA domain ID
collibra:targetAssetA domain target asset
collibra:sourceAssetA domain source asset
collibra:DomainTypeA Collibra domain type
collibra:RelationTypeA Collibra relation type
collibra:roleA relation role
collibra:coRoleA relation co-role
collibra:TagA Collibra tag
collibra:AttributeA Collibra attribute
collibra:valueAn attribute value
collibra:classAn attribute class
collibra:assetAn attribute asset
collibra:attributeTypeAn attribute type
collibra:AttributeTypeA Collibra attribute type
collibra:attributeKindAn attribute kind
collibra:languageAn attributes language
collibra:isIntegerIf attribute is an integer
collibra:allowedValuesThe allowed values

Microsoft Purview

The Knowledge Catalog can be configured to both import data from an Microsoft Purview application running on Azure and export Stardog catalog data back into it. Purview is Microsoft’s data governance, cataloging and protection product.

Configuration

The configuration for a Purview provider requires that Purview credentials be stored in the catalog credential store. See storing credentials for details.

Microsoft Purview on Azure uses the OAuth client_credentials grant type for authorization. Add your Azure client id and application client secret to the catalog credential store and use the returned access key to configure the Purview provider.

insert data {
    graph stardog:catalog:providers
    {
        <urn:purview> a <tag:stardog:api:catalog:PurviewProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;
            <tag:stardog:api:catalog:provider:tenantId> "AZURE_TENTANT_ID_HERE" ;
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
}

This table details the property values that need to be set for configuring a Purview metadata provider.

PropertyDescriptionValues
rdf:typePurview metadata provider classtag:stardog:api:catalog:PurviewProvider
tag:stardog:api:catalog:provider:accessKeyAccess key from credential storeUUID
tag:stardog:api:catalog:provider:serverAddressCloud URL for Purview applicationURL
tag:stardog:api:catalog:provider:tenantIdAzure tenant ID for Purview applicationUUID
tag:stardog:api:catalog:provider:scheduleFrequency of metadata importsQuartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

Exporting Stardog Metadata

The Purview provider exports data source metadata into the configured Purview server. There is no extra configuration required to export Stardog metadata. When the provider's scheduled job is run the export will automatically occur after the import is completed.

The following Purview asset types are added for the Stardog data source metadata:

TypeDescription
stardog_data_sourceThe asset type for Stardog data sources
stardog_databaseThe asset type for databases
stardog_schemaThe asset type for schemas
stardog_tableThe asset type for tables
stardog_columnThe asset type for columns
stardog_conceptThe asset type for mapped concepts

After the scheduled provider job has run you can log into your Purview account and view the Stardog metadata. It is located in a custom stardogcatalog collection.

<a href="../../assets/images/query-stardog/knowledge-catalog/purview-check.png"> <img src="../../assets/images/query-stardog/knowledge-catalog/purview-check.png"> </a>

Data Model

This table contains the classes used for modeling the Purview metadata. Prefix purview is namespace tag:stardog:api:catalog:purview: and the atlas prefix is namespace tag:stardog:api:catalog:atlas:.

ClassPropertyDescription
purview:PurviewThe metadata from an external Purview platform
purview:hasGlossaryA Purview glossary
purview:hasCollectionA Purview collection
purview:hasAssetHas a Purview asset
purview:GlossaryA glossary
purview:hasTermHas a glossary term
purview:GlossaryTermA glossary term
purview:assignedToAsset assigned to term
purview:CollectionA Collection of assets and source
purview:hasSourceA data source
purview:hasAssetAn asset
purview:AssetAn asset
purview:scanIdThe Id of the last scan
purview:lastScannedThe time of the last scan
purview:attributeAn attribute
purview:assetTypeThe asset type
purview:sourceIdAn Id of the source that generated this asset
purview:AssetTypeAn asset type
purview:SourceA data source
purview:RelationshipAn asset relationship
purview:headThe head asset of a relationship
purview:tailThe tail asset of a relationship
purview:AttributeAn asset attribute
purview:valueThe data value
purview:attributeTypeThe type of attribute
purview:AttributeTypeAn attribute type
purview:typeNameThe type name

JDBC

The Knowledge Catalog can be configured to use a JDBC driver to import database metadata into the Knowledge Catalog. Any JDBC-compliant driver should work. Be sure to first add the JAR file to the classpath of the Stardog server.

Configuration

The configuration for a JDBC provider requires that credentials be stored in the catalog credential store. See storing credentials for details.

Currently, the JDBC provider expects a username and password for authentication. Add your database username and password to the catalog credential store and use the returned access key to configure the JDBC provider. When the provider import job is run, the standard jdbc.username and jdbc.password properties will be injected to the JDBC connection string.

insert data {
    graph stardog:catalog:providers
    {
        <urn:jdbcProvider> a <tag:stardog:api:catalog:JdbcProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:jdbcDriver> "DRIVER_CLASSNAME_HERE" ;
            <tag:stardog:api:catalog:provider:jdbcURL> "JDBC_CONNECTION_STRING_HERE" ;
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
}

This table details the property values that need to be set for configuring a JDBC metadata provider.

PropertyDescriptionValues
rdf:typeJDBC metadata provider classtag:stardog:api:catalog:JdbcProvider
tag:stardog:api:catalog:provider:accessKeyAccess key from credential storeUUID
tag:stardog:api:catalog:provider:jdbcDriverJDBC driver class nameThe full class name
tag:stardog:api:catalog:provider:jdbcURLA valid JDBC connection stringA valid connection string
tag:stardog:api:catalog:provider:linkedDataSourceAn optional IRI for a datasource to use in place of the access key for retrieving the credentials and connection stringe.g. data-source://my_datasource
tag:stardog:api:catalog:provider:scheduleFrequency of metadata importsQuartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

Data Model

This table contains the classes used for modeling the Purview metadata. Prefix jdbc is namespace tag:stardog:api:catalog:jdbc: and the catalog prefix is namespace tag:stardog:api:catalog:.

ClassPropertyDescription
jdbc:DBMSA database management system
jdbc:databaseNameThe database name
jdbc:databaseVersionThe database version
jdbc:driverNameThe driver name
jdbc:driverVersionThe driver version
jdbc:userThe accessing user
jdbc:PrimaryKeyA database table column designated to uniquely identify each record
jdbc:ForeignKeyA database table column used to link data between tables
catalog:DatabaseCatalogA database catalog. Not all systems have a catalog, their highest level object may be schema
catalog:DatabaseSchemaDatabase schemas are containers for the tables of the database
catalog:TableA table within a database
catalog:tableNameThe name of a table within a database
catalog:tableTypeThe type of table
catalog:ColumnA database table column
catalog:nameThe name of a column within a table
catalog:columnTypeThe datatype of the column

Atlan

The Knowledge Catalog can be configured to import data from Atlan. Atlan is a data catalog platform that provides active metadata management, business glossaries, column-level lineage and search capability.

Configuration

The configuration for an Atlan provider requires that Atlan credentials be stored in the catalog credential store. See storing credentials for details.

Atlan uses an Atlan API token for authentication. For that token to have access to Atlan assets the token needs to be linked to an Atlan Persona that has the required data policies allowing metadata read access. For more information of Atlan policies click here.

If you have an Atlan API token you can add it to the catalog credential store using the CLI or HTTP API as shown below.

curl -u admin:admin --location 'http://localhost:5820/admin/catalog/credentials' \
--header 'Content-Type: application/json' \
--data '{
    "token": "$ATLAN_TOKEN",
    "label": "Atlan API Token"
}'

Once you have a Stardog catalog access key, use it to configure an Atlan metadata provider as shown.

insert data {
    graph stardog:catalog:providers
    {
        <urn:atlan> a <tag:stardog:api:catalog:AtlanProvider> ;
            <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
            <tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;
            <tag:stardog:api:catalog:provider:schedule> "SCHEDULE_HERE"  .
    }
}

This table details the property values that need to be set for configuring an Atlan metadata provider.

PropertyDescriptionValues
rdf:typeAtlan metadata provider classtag:stardog:api:catalog:AtlanProvider
tag:stardog:api:catalog:provider:accessKeyAccess key from credential storeUUID
tag:stardog:api:catalog:provider:serverAddressCloud URL for Atlan accountURL
tag:stardog:api:catalog:provider:scheduleFrequency of metadata importsQuartz cron expression (ex. 0 0 22 * * ? Every day at 10pm)

The Atlan API token will need to be associated with at least one persona with metadata policies in order to be able to import any entitiy data. See here for more details.

By default, the Atlan asset types Table, Column, View, Schema, Connection, Database, Purpose are imported into Stardog. If you want to customize the list you can use the <tag:stardog:api:catalog:provider:entityType> predicate. In the following example only table and column entities will be imported.

graph stardog:catalog:providers
{
    <urn:atlan> a <tag:stardog:api:catalog:AtlanProvider> ;
        <tag:stardog:api:catalog:provider:accessKey> "ACCESS_KEY_HERE" ;
        <tag:stardog:api:catalog:provider:serverAddress> "CLOUD_URL_HERE" ;
        <tag:stardog:api:catalog:provider:entityType> "Table" ;
        <tag:stardog:api:catalog:provider:entityType> "Column" ;
        <tag:stardog:api:catalog:provider:schedule> "0 0 3/23 ? * * *" .
}

Metadata imported to Stardog will be found in a graph named for the Atlan serverAddress.

select * {
    graph <tag:stardog:api:catalog:your_domain.atlan.com>
    {
        ?s ?p ?o .
    }
}

Exporting Stardog Metadata

The Atlan provider exports data source metadata into the configured Atlan server. There is no extra configuration required to export Stardog metadata. When the provider's scheduled job is run the export will automatically occur after the import is completed.

Assets imported into Atlan are added to a Stardog Knowledge Catalog purpose. By default, only the account attached to the API key will have access to those entities details. An administrator will need to attach data policies to the Stardog Knowledge Catalog purpose to allow additional user access.

Datasource metadata from Stardog can be found in Atlan under the corresponding asset type.

The following Atlan asset types are used for the Stardog data source metadata:

TypeDescription
ConnectionThe asset type for Stardog data sources
DatabaseThe asset type for databases
SchemaThe asset type for schemas
TableThe asset type for tables
ColumnThe asset type for columns

After the scheduled provider job has run you can log into your Atlan account and view the Stardog metadata. Stardog metadata is tagged with an Atlan Stardog tag which in turn is attached to a Stardog Knowledge Catalog purpose. You can use the tag and purpose to search and filter for Stardog assets in Atlan.

Data Model

The data model for Atlan assets is dynamic in that each entity type from Atlan is imported and mapped to a generated OWL class. Atlan entity properties are mapped to OWL object properties and OWL datatype properties. Atlan entities are prefixed with the namespace tag:stardog:api:catalog:atlan:.