Home Getting Started Installation & Setup Development & Integration Deployment & Operations Data Management Technical Support Platform Updates

DocsData ManagementVirtual Graphsdata source configuration

Data Source Configuration

This page covers the configuration options for the various types of Data Sources.

<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>

Overview

The available configuration options for a Data Source depend on the type of the Data Source. Supported Data Source types include JDBC, MongoDB, Elasticsearch, Cassandra, and SPARQL.

Configuration options are stored in a Java properties file.

A typical properties file for a JDBC-type Data Source (in this case, MySQL) looks like:

jdbc.url=jdbc:mysql://localhost/dept
jdbc.username=MySqlUserName
jdbc.password=MyPassword
jdbc.driver=com.mysql.jdbc.Driver

Configuration Options

Options Common to All Data Source Types

The following option applies to Data Sources of all types. The options apply to all Virtual Graphs for a given Data Source.

`unique.key.sets`

| Default | | | Required | No | | Description | For data sources that do not express unique constraints in their metadata, either because unique constraints are not supported or because the data source did not include some or all of the valid constraints for reasons such as performance concerns, this property is used to define additional constraints manually. The property value is a comma-separated list of keys that define unique rows in a table. Each key is itself a comma-separated list of schema-qualified columns, enclosed in parentheses. For example, if table APP.CUSTOMERS has an ID column that serves as a primary key and a pair of columns, FNAME and LNAME, that together are a unique key, the value to express that is: (APP.CUSTOMERS.ID),(APP.CUSTOMERS.FNAME,APP.CUSTOMERS.LNAME) |

`batch.size`

| Default | 100 | | Required | Yes | | Description | The batch.size option determines how many solutions will be evaluated at a Virtual Graph per request of the ServiceJoin operator. A larger batch size can reduce the number of requests to a data source but can make the evaluation of each request more expensive. By default, the batch size is set to 100. The value can be adjusted in the range [0, 10000]. A value of 0 will disable batch requests to the data source. |

JDBC Options

The following properties are used for all relational data sources.

`jdbc.url`

`jdbc.username`

`jdbc.password`

`jdbc.driver`

`sql.dialect`

| Default | Inferred from supported JDBC drivers. ORACLE for unsupported drivers. | | Required | No | | Description | When using an unsupported JDBC driver, this option can be used to specify the format of the generated SQL. The options supported are ATHENA, BIGQUERY, DB2, DERBY, EXASOL, H2, HANA, HIVE, IMPALA, JIRA, MSSQL, MYSQL, ORACLE, POSTGRESQL, REDSHIFT, REST, SALESFORCE, SPARKSQL, SPLUNK, SYBASE, TERADATA |

`sql.default.catalog`

| Default | (taken from JDBC driver) | | Required | No | | Description | Overrides the default catalog for the connected user. Schemas in the default catalog may be referenced without qualification (myschema rather than mycatalog.myschema). For use with databases (like Databricks) that support 3-level (catalog/schema/table) identifiers. See also: Databases and Schemas |

`sql.catalogs`

| Default | | | Required | No | | Description | A comma-separated list of catalogs. This list plus the default catalog define the set of catalogs represented by a *.* wildcard for the sql.schemas option. An asterisk (*) can be used as a wildcard for this, the sql.catalogs option, meaning all catalogs accessible to the JDBC connection. For use with databases (like Databricks) that support 3-level (catalog/schema/table) identifiers. See also: Databases and Schemas |

`sql.default.schema`

| Default | (taken from JDBC driver) | | Required | No | | Description | Overrides the default schema for the connected user. Tables in the default schema may be referenced without qualification (mytable or "mytable" rather than myschema.mytable, "myschema"."mytable", or, for Databricks, mycatalog.myschema.mytable). See also: sql.default.catalog and Databases and Schemas |

`sql.schemas`

| Default | | | Required | No | | Description | A comma-separated list of schemas to append to the schema search path. This option allows SQL queries from the virtual graph mappings to reference tables that are outside of the default schema for the connected user. An asterisk can be used as a wildcard. A lone asterisk (*) means all schemas in the default catalog, mycatalog.* means all schemas in the mycatalog catalog, and *.* means all schemas in all the accessible catalogs. See also: sql.default.catalog and Databases and Schemas |

`sql.skip.validation`

| Default | false | | Required | No | | Description | Set this option to true to bypass the validation of catalog, schema, table, and column references against metadata from the backing database. These checks are made in order to catch configuration problems early. Should not be necessary with any supported database and driver but can be used with a JDBC driver or database that does not support returning schema metadata. |

`jdbc.row.count.estimate.strategy`

| Default | JDBC_GET_INDEX_INFO | | Required | No | | Description | Determines what methods for estimating a table's row count will be used and in what fallback order. Each dialect has a method for getting row count from database metadata, which typically involves querying indexes or system tables. This option controls whether a brute-force SELECT COUNT(*) query should be used. Note: The index/system-table method is considered a failure if it returns a row count estimate less than 2. Valid options are: <br>• JDBC_GET_INDEX_INFO: Use only JDBC metadata or dialect-specific system tables. Do not use SELECT COUNT(*). <br>• COUNT_FALLBACK: Try JDBC metadata first, fall back to SELECT COUNT(*) if the metadata method fails. <br>• COUNT_FIRST: Try SELECT COUNT(*) first, fall back to JDBC metadata if the query fails. |

`jdbc.table.sample.percentage`

| Default | 0.1 (varies by dialect) | | Required | No | | Description | When a brute-force SELECT COUNT(*) is used to determine a table's row count (see jdbc.row.count.estimate.strategy), and the dialect supports the SQL TABLESAMPLE keyword, this option determines the percentage of rows to include in the sample. Valid range is 0-100. A value of 0 disables sampling (performs a full table scan). For example, with a value of 10, the query would be SELECT COUNT(*) FROM table_name TABLESAMPLE SYSTEM (10 PERCENT). The sampled count is automatically scaled to estimate the full table size. Not all dialects support table sampling; for those that do not, this option is ignored. |

Tomcat Connection Pool Options

Additionally, connection pool properties for the built-in Tomcat connection pool are allowed. This set of additional allowed properties is listed in the Tomcat JDBC Connection Pool documentation. Stardog sets these connection pool defaults:

initialSize=3
testWhileIdle=true

# 4 hours
timeBetweenEvictionRunsMillis=14400000

validationQueryTimeout=10

To disable connection pooling, one can set the following connection pool properties:

testOnBorrow=true
timeBetweenEvictionRunsMillis=0
maxIdle=0
minIdle=0

This may be useful for troubleshooting some configurations.

Passing Options Directly to the JDBC Driver

Any options with the prefix ext. will be passed directly to the JDBC Driver. Tomcat connection pool properties do not need to be prefixed with ext.

Any unknown options will be ignored.

MongoDB Options

`mongodb.uri`

Elasticsearch Options

`elasticsearch.rest.urls`

`elasticsearch.indexes`

`elasticsearch.username`

`elasticsearch.password`

Cassandra Options

`cassandra.contact.point`

`cassandra.port`

`cassandra.keyspace`

`cassandra.username`

`cassandra.password`

`cassandra.allow.filtering`

SPARQL Engine/Service Options

`sparql.url`

`sparql.username`

`sparql.password`

`sparql.graphname`

`sparql.statsbasedoptimization`