This chapter discusses Stardog Voicebox, our conversational AI chat interface for your Enterprise Data.
<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>Stardog Voicebox is a conversational AI chat interface designed to enable anyone (any user persona) to ask any question (point, path, descriptive, predictive, geospatial) on any data (structured, semi-structured, un-structured), hallucination-free. It combines the power of Large Language Models (LLMs) and knowledge graphs, as part of an agentic architecture to enable natural language interactions and deliver actionable insights.
With Voicebox, you can ask questions and get answers using domain-specific language, relevant to your use case and knowledge workers. Answers are grounded in enterprise knowledge (i.e. based on facts) and based on the most up-to-date information, leveraging the power of Stardog’s powerful data federation engine, for dynamic querying into your enterprise sources. Voicebox provides traceability and explainability so users have trust in the accuracy and reliability of each response.
The following steps describes the high-level flow of a user interaction with Voicebox:
<img src="../assets/images/voicebox/voicebox-architecture.png" alt="Voicebox architecture diagram" width="500">LLMs are susceptible to generating factually incorrect outputs, referred to as "hallucinations." Stardog Voicebox mitigates this by utilizing knowledge graphs as a core component of its question and answering process.
Voicebox translates natural language queries into SPARQL queries, which are then executed against the Stardog Knowledge Graph. This approach ensures that responses are derived from structured, verifiable data within the knowledge graph, grounding LLM outputs to reliable sources of truth.
The knowledge graph functions as a controlled vocabulary and structured representation of domain knowledge, enabling precise data retrieval and minimizing ambiguity. The use of SPARQL ensures that queries are executed against a defined data model, improving the accuracy and reliability of the responses.
Voicebox distinguishes between information generated by RAG/code/external LLMs and data retrieved directly from the knowledge graph. Voicebox tracks the source of information, indicating whether it was derived from RAG, code execution, external LLMs, or direct queries to the knowledge graph. When a query cannot be executed against the data within the Knowledge Graph, Voicebox returns the response, "Cannot find an answer for this question," indicating that the requested information is not present within the accessible data sources or within the context specified in the given ontology.
To immediately experience the capabilities of Voicebox, you can utilize our pre-configured Voicebox Knowledge Kits. These kits are designed to showcase Voicebox's functionalities with curated datasets, allowing you to explore and interact with data through natural language queries without any initial setup or configuration. Simply select a Knowledge Kit from the available options to begin your Voicebox experience.

Selecting a Knowledge Kit opens a conversational interface with example questions to guide your initial interactions, providing a straightforward way to start exploring the data and understand Voicebox's query handling.

From there, either click on one of our pre-selected queries or type your own. All queries are asked in plain English - no knowledge of SPARQL required!
Stardog Voicebox is also available to use with your own data for Cloud and On-Prem/Virtual Private Cloud customers.
To utilize Voicebox with your own data, you will need to set up an endpoint and configure your data sources as outlined in the Stardog Voicebox Developer Guide. The guide provides detailed instructions on how to connect your data, define your schema, and configure any necessary rulesets to ensure accurate and relevant responses.
The home panel is your starting point for interacting with Voicebox. It's the first screen you'll see and is designed to initiate your data exploration. This panel offers a selection of pre-defined example queries, often referred to as "spotlight questions," to help you get started. Voicebox also features a text input area where you can directly enter your own natural language questions to begin your conversation.

The chat interface panel is your primary area for interacting with Voicebox. It allows you to submit queries and view the system's responses. There are also a number of different action icons you may use:
Within the chat interface, you can view a list of suggested prompts to help you formulate your next query or explore related topics.
<img src="../assets/images/voicebox/suggested-prompts.png" alt="suggested prompts" width="400">The chat history panel provides a record of your previous conversations with Voicebox, allowing you to review past interactions. Conversations are organized by database, with each database (e.g., "all_paws_wealth," "wealth-rag") listed as a distinct section. Within each section, you'll see a list of your conversations, along with the associated count of questions asked. This organization helps you quickly find and revisit your interactions within a specific data context. The panel also includes a search bar at the top, enabling you to search your history for specific keywords.
<img src="../assets/images/voicebox/chat-history.png" alt="chat history" width="400">The knowledge panel provides a detailed view of the data associated with a user question or selected item. Knowledge panels can be opened in two different ways. Knowledge panels will automatically open based on a response from Voicebox with a more detailed response and structured data such as table and chart when applicable. Any instance of data from the chat panel or in the knowledge panel can be selected to open a knowledge panel specific to that instance of data. For instance knowledge panels, a summary including defined attributes and related concepts will be shown.
Users may use the knowledge panel to gain a deeper understanding of a given entity or set of entities, including detailed attributes and relationships that are hyperlinked for easy traversal across the connected data landscape. As you traverse through different information, a breadcrumb trail allows you to track the path and easily get back to the original state. The knowledge panel also highlights the lineage through the data sources icon. More details on lineage can be found in the following section.

Voicebox provides detailed lineage information, tracing the origin of data used to generate responses. This includes identifying data sources, both local and virtual. Understanding data lineage is essential for verifying the accuracy and reliability of Voicebox output.
The knowledge panel offers a granular view of the data sources involved in answering a user's query. This includes:

This level of detail allows users to trace the origin of information across systems, which is helpful for:
You can delve deeper into a specific Stardog Concept, such as the "Mutual Fund" class, by clicking on it. The knowledge panel will then update to show instances, attributes, and related concepts associated with that class.
Within Stardog Designer, Voicebox serves as an assistive tool for ontology development. It enables you to:
Voicebox aims to make the process of building and maintaining your ontology more intuitive and efficient.
<video autoplay loop muted playsinline width="700"> <source src="../../assets/videos/applications/voicebox/voiceboxindesigner.mp4" type="video/mp4"></source> <source src="../../assets/videos/applications/voicebox/voiceboxindesigner.webm" type="video/webm"></source> </video>Stardog Explorer leverages Voicebox to provide a natural language interface for querying your knowledge graph. By posing questions in plain language, Voicebox translates your queries into SPARQL, retrieving the relevant data from your graph. Explore a result by clicking on the linked Voicebox response. This feature makes knowledge graph exploration more accessible and user-friendly.
<video autoplay loop muted playsinline width="700"> <source src="../../assets/videos/applications/voicebox/vbinexplorer.mp4" type="video/mp4"></source> <source src="../../assets/videos/applications/voicebox/vbinexplorer.webm" type="video/webm"></source> </video>Voicebox in Stardog Studio responds to natural language questions with a SPARQL query. By posing questions in plain language, Voicebox translates your queries into SPARQL providing even the most technical users an option to accelerate their query writing. The returned SPARQL is written directly in Studio's workspace, so users can modify and expand as needed.
<video autoplay loop muted playsinline width="700"> <source src="../../assets/videos/applications/voicebox/vbinstudio.mp4" type="video/mp4"></source> <source src="../../assets/videos/applications/voicebox/vbinstudio.webm" type="video/webm"></source> </video>Voicebox experiences can be customized based on specific models/ontologies for a given use case that capture key business concepts and relationships, including custom rulesets and constraints, relevant to the domain and user personas. This design paradigm is available through our Designer application and published models/ontologies along with the business logic can be configured for use with Voicebox. The settings panel allows you to configure a Voicebox experience to the specific knowledge database and models specific to the user needs.
<img src="../assets/images/voicebox/settings.png" alt="settings" width="400">Rules can be used to infer additional data based on patterns in your knowledge graph. An example of a rule setup in Designer - a supply chain analyst can specify a rule that can infer an anomaly if the conditions below are met against data accessed at query time. These rules can be changed and additional rules may be added which allows users to put multiple lenses over the same data.

The accuracy of Voicebox related to customer-specific domain knowledge can be improved through the development of prompts as few-shot examples. Few-shot examples provide Voicebox with question-answer pairs, enabling adaptation to new or less common query patterns, relevant to the domain and use case. Few-shot examples can be developed by subject matter experts using a no-code visual interface in Stardog Explorer. Implementation details are available in the Voicebox Developer Guide. Examples can be refined based on user interactions.
Spotlight questions are pre-defined queries that highlight key insights or frequently accessed information within the knowledge graph. These questions provide direct answers to common queries, and administrators can define questions relevant to specific business needs. Configuration and management details are available in the Voicebox Developer Guide. Questions can be updated to reflect data or business changes.

We don’t use actual customer data to train or finetune our models.
<img src="../assets/images/voicebox/feedback-modal.png" alt="feedback" width="400">These agents are currently only available as part of a private preview.
Voicebox employs a suite of specialized agents, each designed to perform specific data tasks, enabling users to interact with and analyze data in a targeted manner. These agents operate autonomously, selecting the most appropriate tool for the given query.

Voicebox agents are equipped with tools that enable automatic labeling and categorization of user queries, the generation and execution of code for specific tasks, and the creation of summaries for tabular data.
Voicebox can process questions based on unstructured documents, such as financial, technical, compliance, and regulatory documents. To enable this, an ETL pipeline is used to index these documents as embeddings within the built-in vector store of the Stardog Platform. This ETL pipeline utilizes Spark to stream from various document stores, currently supporting MS One Drive, Google Drive, and local storage with current support for reading .PDF and .DOCX file types, chunking text information with enhanced intelligence to chunk embedded table content in these documents. The diagram below illustrates this process.
Detailed instructions for executing the RAG pipeline are available under the Using Unstructured Data with Voicebox section of the documentation.
