This page covers the configuration options for the various types of Data Sources.
<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>The available configuration options for a Data Source depend on the type of the Data Source. Supported Data Source types include JDBC, MongoDB, Elasticsearch, Cassandra, and SPARQL.
Configuration options are stored in a Java properties file.
A typical properties file for a JDBC-type Data Source (in this case, MySQL) looks like:
jdbc.url=jdbc:mysql://localhost/dept
jdbc.username=MySqlUserName
jdbc.password=MyPassword
jdbc.driver=com.mysql.jdbc.Driver
The following option applies to Data Sources of all types. The options apply to all Virtual Graphs for a given Data Source.
unique.key.sets| Default | |
| Required | No |
| Description | For data sources that do not express unique constraints in their metadata, either because unique constraints are not supported or because the data source did not include some or all of the valid constraints for reasons such as performance concerns, this property is used to define additional constraints manually. The property value is a comma-separated list of keys that define unique rows in a table. Each key is itself a comma-separated list of schema-qualified columns, enclosed in parentheses. For example, if table APP.CUSTOMERS has an ID column that serves as a primary key and a pair of columns, FNAME and LNAME, that together are a unique key, the value to express that is: (APP.CUSTOMERS.ID),(APP.CUSTOMERS.FNAME,APP.CUSTOMERS.LNAME) |
batch.size| Default | 100 |
| Required | Yes |
| Description | The batch.size option determines how many solutions will be evaluated at a Virtual Graph per request of the ServiceJoin operator. A larger batch size can reduce the number of requests to a data source but can make the evaluation of each request more expensive. By default, the batch size is set to 100. The value can be adjusted in the range [0, 10000]. A value of 0 will disable batch requests to the data source. |
The following properties are used for all relational data sources.
jdbc.url| Default | | | Required | Yes | | Description | The URL of the JDBC connection. |
jdbc.username| Default | | | Required | No | | Description | The username used to make the JDBC connection. |
jdbc.password| Default | | | Required | No | | Description | The password used to make the JDBC connection. |
jdbc.driver| Default | | | Required | No | | Description | The driver class name used to make the JDBC connection. |
sql.dialect| Default | Inferred from supported JDBC drivers. ORACLE for unsupported drivers. |
| Required | No |
| Description | When using an unsupported JDBC driver, this option can be used to specify the format of the generated SQL. The options supported are ATHENA, BIGQUERY, DB2, DERBY, EXASOL, H2, HANA, HIVE, IMPALA, JIRA, MSSQL, MYSQL, ORACLE, POSTGRESQL, REDSHIFT, REST, SALESFORCE, SPARKSQL, SPLUNK, SYBASE, TERADATA |
sql.default.catalog| Default | (taken from JDBC driver) |
| Required | No |
| Description | Overrides the default catalog for the connected user. Schemas in the default catalog may be referenced without qualification (myschema rather than mycatalog.myschema). For use with databases (like Databricks) that support 3-level (catalog/schema/table) identifiers. See also: Databases and Schemas |
sql.catalogs| Default | |
| Required | No |
| Description | A comma-separated list of catalogs. This list plus the default catalog define the set of catalogs represented by a *.* wildcard for the sql.schemas option. An asterisk (*) can be used as a wildcard for this, the sql.catalogs option, meaning all catalogs accessible to the JDBC connection. For use with databases (like Databricks) that support 3-level (catalog/schema/table) identifiers. See also: Databases and Schemas |
sql.default.schema| Default | (taken from JDBC driver) |
| Required | No |
| Description | Overrides the default schema for the connected user. Tables in the default schema may be referenced without qualification (mytable or "mytable" rather than myschema.mytable, "myschema"."mytable", or, for Databricks, mycatalog.myschema.mytable). See also: sql.default.catalog and Databases and Schemas |
sql.schemas| Default | |
| Required | No |
| Description | A comma-separated list of schemas to append to the schema search path. This option allows SQL queries from the virtual graph mappings to reference tables that are outside of the default schema for the connected user. An asterisk can be used as a wildcard. A lone asterisk (*) means all schemas in the default catalog, mycatalog.* means all schemas in the mycatalog catalog, and *.* means all schemas in all the accessible catalogs. See also: sql.default.catalog and Databases and Schemas |
sql.skip.validation| Default | false |
| Required | No |
| Description | Set this option to true to bypass the validation of catalog, schema, table, and column references against metadata from the backing database. These checks are made in order to catch configuration problems early. Should not be necessary with any supported database and driver but can be used with a JDBC driver or database that does not support returning schema metadata. |
jdbc.row.count.estimate.strategy| Default | JDBC_GET_INDEX_INFO |
| Required | No |
| Description | Determines what methods for estimating a table's row count will be used and in what fallback order. Each dialect has a method for getting row count from database metadata, which typically involves querying indexes or system tables. This option controls whether a brute-force SELECT COUNT(*) query should be used. Note: The index/system-table method is considered a failure if it returns a row count estimate less than 2. Valid options are: <br>• JDBC_GET_INDEX_INFO: Use only JDBC metadata or dialect-specific system tables. Do not use SELECT COUNT(*). <br>• COUNT_FALLBACK: Try JDBC metadata first, fall back to SELECT COUNT(*) if the metadata method fails. <br>• COUNT_FIRST: Try SELECT COUNT(*) first, fall back to JDBC metadata if the query fails. |
jdbc.table.sample.percentage| Default | 0.1 (varies by dialect) |
| Required | No |
| Description | When a brute-force SELECT COUNT(*) is used to determine a table's row count (see jdbc.row.count.estimate.strategy), and the dialect supports the SQL TABLESAMPLE keyword, this option determines the percentage of rows to include in the sample. Valid range is 0-100. A value of 0 disables sampling (performs a full table scan). For example, with a value of 10, the query would be SELECT COUNT(*) FROM table_name TABLESAMPLE SYSTEM (10 PERCENT). The sampled count is automatically scaled to estimate the full table size. Not all dialects support table sampling; for those that do not, this option is ignored. |
Additionally, connection pool properties for the built-in Tomcat connection pool are allowed. This set of additional allowed properties is listed in the Tomcat JDBC Connection Pool documentation. Stardog sets these connection pool defaults:
initialSize=3
testWhileIdle=true
# 4 hours
timeBetweenEvictionRunsMillis=14400000
validationQueryTimeout=10
To disable connection pooling, one can set the following connection pool properties:
testOnBorrow=true
timeBetweenEvictionRunsMillis=0
maxIdle=0
minIdle=0
This may be useful for troubleshooting some configurations.
Any options with the prefix ext. will be passed directly to the JDBC Driver. Tomcat connection pool properties do not need to be prefixed with ext.
Any unknown options will be ignored.
mongodb.uri| Default | |
| Required | Yes |
| Description | The URI for the MongoDB connection. Examples: mongodb://localhost/mydb or mongodb+srv://myUserName:myP4ssw0rd@cluster0-kgprod.company.com/mydb |
elasticsearch.rest.urls| Default | |
| Required | Yes |
| Description | Whitespace-delimited list of connection host/port values for Elasticsearch. Example: server1:9200 server2:9200 server3:9200 |
elasticsearch.indexes| Default | All accessible indexes available at elasticsearch.rest.urls |
| Required | No |
| Description | A comma-delimited list of indexes to make available from this data source. |
elasticsearch.username| Default | | | Required | No | | Description | Username for Elasticsearch connections. |
elasticsearch.password| Default | | | Required | No | | Description | Password for Elasticsearch connections. |
cassandra.contact.point| Default | |
| Required | Yes |
| Description | The address of the Cassandra node(s) that the driver uses to discover the cluster topology. Example: cassandra.abc.com |
cassandra.port| Default | 9042 | | Required | No | | Description | The port to use to connect to the Cassandra host. |
cassandra.keyspace| Default | | | Required | Yes | | Description | The Cassandra keyspace to use for this session. |
cassandra.username| Default | | | Required | No | | Description | The username for the Cassandra cluster. |
cassandra.password| Default | | | Required | No | | Description | The password for the Cassandra cluster. |
cassandra.allow.filtering| Default | false |
| Required | No |
| Description | Whether to include the ALLOW FILTERING clause at the end of Cassandra CQL queries. Not recommended for production use. |
sparql.url| Default | |
| Required | Yes |
| Description | SPARQL query endpoint/connection string with database specified, i.e. http://myhost:26023/testdb/query |
sparql.username| Default | | | Required | No | | Description | The username to access the SPARQL endpoint. |
sparql.password| Default | | | Required | No | | Description | The password to access the SPARQL endpoint. |
sparql.graphname| Default | | | Required | Yes | | Description | The graph name on the SPARQL endpoint to be mapped as virtual graph. |
sparql.statsbasedoptimization| Default | true |
| Required | No |
| Description | Whether to enable statistics-based optimization while accessing the SPARQL endpoint. |