This chapter provides information to developers about how to make sure Stardog Voicebox will work effectively and accurately.
<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>Stardog Voicebox is available to use with your own data for Essentials and Enterprise customers.
This developer's guide provides information about how to make sure Stardog Voicebox will work effectively and accurately.
In order to use Voicebox with your own data, follow these steps:
That's it! Once you follow these steps, you can go to the Voicebox tab in Stardog Cloud and start having a conversation.
When you are publishing your data model to a new database, make sure the Voicebox checkbox is selected as shown here:

The Voicebox option will only be available if 1) your Stardog installation has a license that allows Voicebox, and 2) you have not exceeded the number of Voicebox databases allowed in your license.
If you would like to use Voicebox with an existing database, you have to set the database option voicebox.enabled manually. See the
Database Configuration section for more details on this topic.
It is strongly suggested to create manual example queries. See the Stored Queries for more details about adding example queries, and make sure you follow the guidelines outlined below when creating these examples.
Voicebox is developed to work with data models created by Designer but you can bring your own data models created elsewhere as long as they conform to the guidelines outlined in this section. The key information used by Voicebox in data models to generate SPARQL queries are:
rdfs:label and rdfs:comment are used for labels and comments respectively. The label.properties does not have an effect on schema.rdfs:subClassOf.so:domainIncludes or so:rangeIncludes._: prefix) may confuse the LLM.Data models should be clear and as concise as possible for Voicebox to make sense of the model and successfully generate SPARQL queries. General data modeling best practices apply for Voicebox too:
state property in an Address class should be an IRI and not a literal. IRIs allow Voicebox to link query results to graph elements, which is not possible with literals (see Stored Queries section for more discussion on this topic).xsd:date or xsd:integer where appropriate instead of xsd:string.rdfs:domain and rdfs:range, but you should avoid using these constructs in data models as much as possible. Assigning multiple domains and ranges with the RDFS vocabulary has unintended consequences for reasoning, so use so:domainIncludes or so:rangeIncludes instead.Needless to say, the data model should reflect how the instance data is represented. Ideally, there are SHACL constraints associated with the data model so that the instance data can be validated with respect to these constraints. Minimally the domains and ranges of attributes and relationships should be validated.
Although only one schema can be active at a time, multiple schemas may coexist within a Voicebox database. Voicebox will reference only the schema selected in the settings when answering questions.
To achieve the best accuracy on question answering for your knowledge graph, we recommend adding stored example queries that demonstrate key usage or important parts of your schema. They are also your opportunity to demonstrate quirks in the data, e.g. perhaps for legacy reasons a field is typed as a string but is actually numeric. You can add a stored query to demonstrate using int or strdt to create an appropriate typed value to use for comparisons.
You can also codify answers of specific queries that drive demos, or key behaviors in applications built on Voicebox, by storing relevant, related queries.
We recommend a few stored examples to start, no more than 5 to 10; in aggregate, your stored examples queries should demonstrate usage of the entire data model. Variety is much more important than volume.
Add new queries as needed over time to address user feedback and new use cases. Stored queries can be created in Studio, Explorer, or the CLI. You typically will not need more than 50 to achieve high accuracy (>90%) for your use case.
For a stored query to be used by Voicebox, it should satisfy the following three conditions:
system:voiceboxQuestion, where system is the namespace (http://system.stardog.com/), that assigns one or more natural language questions with the query. The metadata field is automatically added by Explorer and Studio as explained below.The database should have the option voicebox.enabled=true set before the queries are stored. If this option is enabled after stored queries are already added, those queries should be readded. See the next section that discusses details for indexing.
Studio and Explorer provide UI components to add the natural language questions:

The Voicebox metadata field can also be set via the command-line as well:
stardog-admin stored import ListProductBrands.ttl
where ListProductBrands.ttl would look like (with DBNAME and USERNAME set to appropriate values):
@prefix stardog: <tag:stardog:api:> .
@prefix system: <http://system.stardog.com/> .
system:QueryQ10000 a system:StoredQuery , system:SharedQuery ;
system:queryName "ListProductBrands" ;
system:queryDatabase "DBNAME" ;
system:queryCreator "USERNAME" ;
system:voiceboxQuestion "List all Product Brands", "Find all the product brands" ;
system:queryString """SELECT DISTINCT ?productBrand0
WHERE {
?productBrand0 a scm:Product_Brand .
}""".
When a query matching the constraints listed in the previous section is stored in Stardog, the information about this stored query is automatically added to the named graph system:VoiceboxQuestions after several preprocessing steps.
This named graph is updated in a background thread, so if multiple stored queries are being added with stardog-admin stored import, it might take a few seconds or minutes for all the queries to appear in this named graph. Use stardog-admin ps list to check the background process.
The stored query information in this named graph will look like this:
system:QueryMyQuery a system:StoredQuery , system:SharedQuery ;
system:queryName "MyQuery" ;
system:voiceboxQuestion "First natural language question" ,
"Second natural language question" ;
system:queryString """ ... SPARQL Query ... """ .
Whenever a stored query is updated, the information in this named graph will be updated automatically as well. Do not update this named graph directly, because the changes might be lost! Always update the stored query, and changes will be reflected in this named graph.
The SPARQL query string will look different from the stored query string. This is due to preprocessing steps applied automatically before the query info is copied to this named graph. These preprocessing steps make all stored queries have canonical formatting, and they automatically apply some of the guidelines described in the next section.
The preprocessing steps make changes to queries that may cause them to return a different set of results than the original version. These changes are meant to make queries more generic and reusable while fixing typical user mistakes. If queries are written by advanced users following the guidelines below, it is better to disable preprocessing by setting the database configuration option voicebox.preprocessors=noop, as explained below.
In Stardog 10, the following preprocessors will be used by default:
stardog:label. Type or property IRIs will not be affected.rdfs:label and replace them with stardog:label.?subjN, ?subj0, or ?obj0 with meaningful names based on types and properties used in the query.stardog:label and stardog:property:textMatch usage in queries are serialized using property function syntax and not the SERVICE syntax.The preprocessing can be controlled by the database configuration option voicebox.preprocessors. The default value for this option will use all the preprocessors described above. Setting this option to noop will disable all preprocessing steps. This option can also be set to a comma-separated list of preprocessor names to choose specific preprocessors.
Voicebox preprocesses each stored query when it is created or updated, using the namespaces defined for the database at that time. These namespaces are stored as part of the preprocessed query. If a namespace that was referenced by a stored query is later removed or changed in the database, subsequent Voicebox operations may report missing or invalid namespace errors in diagnostic reports. To correct this, you can re-run preprocessing on all published queries using the following command (replace SERVER_URL and DB_NAME with the appropriate values):
stardog-admin --server SERVER_URL db optimize -o optimize.vacuum.data=false optimize.voicebox=true -- DB_NAME
Follow these guidelines when creating stored queries:

SELECT DISTINCT ?movie ?director
WHERE {
?movie a so:Movie .
?movie so:director ?director .
?director stardog:label "Steven Spielberg" .
}
or like this in Explorer:

stardog:label in queries instead of rdfs:label. stardog:label is a new service added in Stardog 10 that finds entities matching a label (or vice versa). It takes advantage of full-text search, so it can do fuzzy matching. It can also look up multiple label properties if the label.properties option is configured to do so.GROUP BY ?year expression, and the variable ?year should be returned. Conversely, if the query is not asking about a count explicitly, the query should return the entities and not a count.?x, ?y, or unnamed variables like bnodes []. When queries are generated by Explorer, the auto-generated variable names like ?subj0 and ?obj0 are used, but Voicebox automatically renames those variables based on types and properties used in the query.The following table summarizes the above advice:
| DOs | DON'Ts |
|---|---|
| Create SELECT queries | Create UPDATE queries |
| Use DISTINCT as much as possible | Have duplicates in your results |
| Run your queries to make sure they return the expected answers | -- |
| Write some of your questions as instructions | -- |
| Write questions about instance data | Write questions about schema |
| -- | Include conditions in your query that aren't required to answer the question |
| Return IRIs | Return labels |
| Select the variables mentioned in your question | Select variables not mentioned in your question |
| Find the entity your query is looking for | Use constants for entities |
Use stardog:label | Use rdfs:label |
| Aggregate queries that ask for a count | Include other variables when you ask for the count |
| -- | Use ORDER BY, GROUP BY, or LIMIT if they are not required by the question |
| Use meaningful variable names | Use variable names like ?x or bnodes |
| Define prefixes in the DB | Define prefixes in the query |
| -- | Use FROM/FROM NAMED in queries |
The database configuration option voicebox.enabled needs to be set to true for Voicebox to work with a database. This option can be set at any time, but it is best practice to set this option at database creation time.
This option can be set to true only if the license for the Stardog server allows Voicebox to be enabled. Stardog licenses have a metadata field voicebox.count.limit that specifies the maximum number of databases that can have Voicebox enabled at any time. You can run stardog-admin license info to see this field for your license.
If a database is created with Stardog 9 or earlier, the default reasoning schema is set to tag:stardog:api:context:local, which is a built-in wildcard. This is not compatible with Voicebox. The default reasoning schema should be set to one or more specific named graphs before Voicebox can be enabled. Use the command stardog reasoning schema -–list DB to see the list of schemas and their associated graphs and the command stardog reasoning schema -–add to update the schema graphs.
If named graph security is enabled for the database, all users should be given read access to the named graph http://system.stardog.com/VoiceboxQuestions. If a user does not have access to this named graph, they will not be able to see the example queries, and Voicebox will not be able to answer their questions accurately. See Query Preprocessing for more details about this named graph.
Setting the option voicebox.enabled will automatically trigger various other database options to be configured as well. No further administrator action is required for the additional options to be set. Information about these additional options are provided below for completeness.
search.enabled=true
search.index.contexts.filter=tag:stardog:api:context:localsearch.index.contexts.excluded=false
search.semantic.enabled=truesearch.semantic.index.contexts.filter=http://system.stardog.com/VoiceboxQuestions, [all graphs of the default schema]
search.semantic.model=””
search.semantic.model="djl://ai.djl.huggingface.pytorch/sentence-transformers/paraphrase-albert-small-v2". You can see more about supported protocols and address variants here.search.index.properties.excluded=http://system.stardog.com/queryString
search.index.compute.norm=true
label.properties=http://www.w3.org/2000/01/rdf-schema#label
reasoning.schema.versioning.enabled=true
reasoning.precompute.non_empty.predicates=false
Below are some common issues users run into when using Stardog Voicebox:
schema and so for <https://schema.org/>.A diagnostics tool is available to check for these issues. On the manage endpoints page, go to the three-dot menu actions for the endpoint being validated and select View Diagnostic Report. Scroll down to the Voicebox Report section, select the database and data model for validation, and click Generate Report.

API access to Voicebox is currently limited and not enabled for most users. Please contact Stardog for availability and access information.
Stardog Voicebox is available via an API as well. If your organization has been given Voicebox API access then you will see an option to "Manage API Keys" in the menu in the lower left corner:

On the API management page first click "New Voicebox App" and choose an endpoint and a database for your app. You will also have the options to customize the connections details such as the data graph and the schema that will be used by your app. Be sure to enable reasoning if you have inference rules in your database.

After the app is created, you can click "New App Key" to create your API key. You can choose an expiration date for your key. Once your API key is created make sure you copy the key before closing the window because you will not have access to the key again.
Voicebox API is accessible via HTTP and is documented here. In the document page you can click the "Authorize" to enter your API key to test the endpoints directly within your browser.

Voicebox can be integrated with external AI assistants and developer tools through the Model Context Protocol (MCP), an open standard for connecting LLM applications to data sources and tools.
The Stardog Cloud MCP Server exposes Voicebox to any MCP-compatible host application, including Claude Desktop, Claude Code, Cursor, and custom MCP clients, so you can ask natural language questions of your Stardog knowledge graphs directly from those environments.