The data storage layer is the foundation of any data management system. Understanding how data is stored and updated is a prerequisite to effective administration of the system, especially for scheduling necessary maintenance operations.
As of version 7, Stardog's storage layer is based on RocksDB, which is a key-value database maintained by Facebook. RocksDB uses LSM trees for storing data. The LSM tree data structure is optimized for write performance on modern hardware due to its append-only update strategy: instead of changing data items in-place (like the classical B+ tree would do), each update appends the item to the end of a sorted sequence in the main memory and, when that is filled up, it is flushed to a new file on disk. The tree is a collection of levels each of which contains one or several such immutable sorted files, the so-called SSTables (or SST).
This data update strategy achieves very good write throughput because appending data is much more efficient than random, dispersed lookups or updates-in-place. However, it has important implications for performance of read queries since they may require reading multiple files and merging contents on the fly. As the number of files grows so does the read overhead. To rectify that issue RocksDB periodically executes the compaction operation in the background to merge files and reduce their number.
It is important to understand that every update in a Stardog database, not just adding a triple but also deleting triples, is an append operation on the physical storage level (that is, a new key inside RocksDB). This has the following implications:
Since write transactions can leave duplicate triples and tombstones, read queries have to filter them out to ensure
that client applications get a logically consistent view of the data. That adds overhead on top of having to
process multiple files. To address this problem the server can compress the multiple records into one via the following CLI command:
stardog-admin db optimize.
Database optimization, or executing stardog-admin db optimize, is a key maintenance operation to optimize the organization
of data in storage so it can be read in the most efficient way. The operation does not require exclusive
access to the database and does not block other operations, i.e. write transactions, read queries, backups, etc.
It is however a good idea to run optimize at times of low system activity since it's disk IO intensive (it reads every file storing some database's data on disk). Also, concurrently running queries or transactions
may have access to data snapshots, which are updated in later transactions. In that case optimize will not be able
to compact or remove those snapshots.
Check the databases.{dbName}.queries.running and databases.{dbName}.txns.openTransactions metrics to see
whether queries or transactions are running against the database.
db optimize doThe operation performs the following tasks:
index.statistics.update.automatic option) but optimize
will also recompute it from scratch.The db optimize command executes all the tasks above but takes options to exclude some of them. For example, one may execute just
the compaction/vacuuming steps to optimize the physical data layout on disk:
$ stardog-admin db optimize -o optimize.statistics=false optimize.compact=true optimize.vacuum.data=true -- dbName
or just statistics. Usually it is the compaction and vacuuming which take most of the time, so if only statistics refresh is needed, one can use:
$ stardog-admin db optimize -o optimize.statistics=true optimize.compact=false optimize.vacuum.data=false -- dbName
See the man page for the full list of options.
db optimizeSince db optimize is important for read performance but could run for a long time (possibly over an hour for a database
with several hundred million triples or more), it is important to schedule it properly. Running it too often places an unnecessary
burden on the disk subsystem while not running it often enough will likely result in slow queries. We provide here
guidelines for deciding when to execute it.
The most informed way to make that decision is by monitoring the server's metrics. Specifically Stardog reports the approximate size of the database (i.e. the number of triples) and the number of keys in the tables inside storage, as follows:
databases.{db}.size: 1,000,000
databases.{db}.ternary.numKeys: 20,000,000
The *.ternary.numKeys metric shows the number of keys in the indexes representing sorted collections of quads.
In the standard configuration Stardog maintains 8 such indexes (SPOC, POSC, OSPC, etc.). Therefore in the
optimal state the database should report 8 times higher value for *.ternary.numKeys than for *.size. With each
transaction which deletes some data the difference will increase (because, as explained above, each deleted
triple is appended as a tombstone). Based on the current experience we suggest to run db optimize no later than
when *.ternary.numKeys grows beyond 16x *.size.
However, db optimize might be needed before that. For example, when the data is only added but never deleted,
there will be no tombstones so the ratio won't deviate from 8x.
However the data might still need to be compacted to eliminate duplicate versions of the same triple if same triples were added repeatedly over time.
One can execute stardog data size --exact {dbName} to
obtain the accurate size of the database and if that is substantially smaller than the *.size metric,
execute db optimize. It's possible to disable vacuuming if there were no deletes to speed up the process.
stardog data size --exact {dbName} will scan the entire database. While faster than db optimize it may also
take a considerable amount of time.
In addition to monitoring the metrics, there are some clear-cut cases which warrant optimization, typically involving
deletion of a large amount of data. The classical example is a wipe-and-load operation which drops a large named graph
and re-creates it with new data, for example when data is periodically refreshed or a staging graph's data moves
to production after cleansing. A wipe-and-load means that all deleted triples are first written as tombstones
and then the new triples are appended on top of that (in addition to existing triples still being present).
That is likely to have an impact on read performance unless db optimize runs immediately after the commit.
Stardog employs an adaptive transactional bulk loading mechanism that automatically switches between different data loading strategies based on the total size of the transaction at commit time. There are two primary code-paths in Stardog for handling writes: The default "pipelined" write-path, which buffers writes in memory and flushes them to disk in the background, and the "bulk loading" path, which is optimized for large data loads. The intent of the adaptive loading is to optimize performance for large data operations while maintaining efficient processing for smaller updates.
The adaptive bulk loading behavior is controlled by two database configuration properties. To understand how they work, one needs to remember that there is a difference in estimated TX size and actual TX size. The estimated size is based on the size of the input data, while the actual size is based on the number of quads added or removed.
Visualise these thresholds on a horizontal axis with 3 ranges:
index.bulk.load.sst.txlimit (estimated): tx pipeline is used without adaptationsindex.bulk.load.sst.txlimit - index.bulk.load.tx.threshold (actual): the adaptive method is used, the final decision is made either at prepare time or when the size exceeds the upper bound.index.bulk.load.tx.threshold (estimated) - inf: the bulk load method is used without adaptation.index.bulk.load.tx.thresholdThis property defines the threshold of the actual number of quads where Stardog will use bulk loading. If less quads are part of the transaction, standard transactional updates are used. Standard writes are done single-threaded, and data is written to RocksDB memtables.
<dl> <dt>Mutability</dt> <dd>Mutable</dd> <dt>Default</dt> <dd><code>4000000</code></dd> <dt>Description</dt> <dd>Threshold of <b>parsed</b> quads at which Stardog switches from standard transactional updates to bulk loading. When an input file <b>actually</b> contains more quads than this threshold, the bulk loading mechanism is engaged. The maximum effective value is 8388608 (approximately 8.4 million quads).</dd> </dl>index.bulk.load.sst.txlimitThis property controls when bulk loading is considered based on the estimated input size. Should the estimated input size of the first batch of data added in a transaction exceed this threshold, the bulk loading mechanism will be considered for the entire transaction. Should it be less than this threshold, standard transactional updates will be used.
<dl> <dt>Mutability</dt> <dd>Mutable</dd> <dt>Default</dt> <dd><code>495000</code></dd> <dt>Description</dt> <dd>The threshold of **estimated input size** at which the bulk loading mechanism will be considered for transactional updates. This provides an heuristic for determining when to use bulk loading, particularly useful when the exact quad count is not yet known.</dd> </dl>Bulk loading is automatically triggered when using commands like:
stardog data add or stardog data remove with data sizes larger than the configured thresholds.--tx <id> option the total data size of the transaction is considered at commit time, or whenever a flush to disk is required.stardog data add --tx <id> will use adaptive transactional bulk loading, as long as only the add command is used consecutively and the count of triples added exceeds the threshold.stardog data remove --tx <id> will use adaptive transactional bulk loading as long as only the remove command is used consecutively and the count of triples removed exceeds the threshold.stardog query --tx <id> within the transaction will force a flush of the already written data.You can adjust these thresholds to better match your workload patterns:
$ stardog-admin metadata set -o index.bulk.load.tx.threshold=2000000 index.bulk.load.sst.txlimit=50000 -- myDatabase
Understanding the performance implications of transactional bulk loading is crucial for optimizing your Stardog database operations.
Transactional bulk loading significantly improves write throughput for large data operations by:
However, transactional bulk loading creates SST files directly in the LSM tree structure, which can have downstream effects on read performance and compaction. Subsequent read operations may suffer temporarily.
After bulk loading operations:
Bulk loading has significant implications for db optimize operations:
Increased optimization time: Databases that have undergone multiple transactional bulk loading operations, typically require more time to optimize.
More frequent optimization needed: The SST files created by bulk loading at lower LSM tree levels require compaction to maintain optimal read performance.
Tombstone accumulation: When bulk loading is used for delete operations (data remove --tx), tombstones accumulate more rapidly, necessitating more frequent vacuuming.
Mixed read/write workloads:
index.bulk.load.tx.threshold if SPARQL query performance suffers after transactional bulk loadingdb optimize after repeated bulk loading operationsdatabases.{db}.ternary.numKeys metricFor write-heavy workloads:
db optimize when the ratio of *.ternary.numKeys to *.size exceeds 12-16xFrequent incremental updates:
Monitor the impact of bulk loading on your specific workload by tracking query performance metrics before and after large data operations, and adjust the thresholds accordingly.
When creating a database with bulk loaded data, Stardog provides the dictionary.bulk.load option to optimize
bulk loading performance of the dictionary. This option uses a temporary hash-based dictionary during database creation that can
significantly improve loading speed, especially for large datasets.
Please note that this option is by default disabled. Enabling it requires careful consideration of the memory
available on the machine where Stardog is running.
The option is enabled when memory.mode=bulk_load in your Stardog configuration, but it can also be used with the default memory mode.
Take a look at Managing Databases with Large Datasets for more information on memory modes.
The dictionary.bulk.load option allocates a page cache of the specified size (e.g., 1G for 1 gigabyte)
to build the mapping dictionary during database creation. This temporary hash-based dictionary is written to
RocksDB SST files at the end of the db create operation and is disposed of after completion.
The option can be set during database creation for example via the Java API:
try (AdminConnection aAdminConnection = AdminConnectionConfiguration.toServer("http://localhost:5820").credentials("admin", "admin").connect()) {
aAdminConnection.newDatabase("test")
.set(IndexOptions.BULK_LOAD_DICTIONARY_MEM, 1073741824L) // 1GB in bytes
.create();
}
Performance improvements can be substantial, customers have reported 2x speedups.
For example, loading 65 million statements may take approximately 9 minutes without this setting, compared to 4.5 minutes with dictionary.bulk.load set to 1GB.
The memory allocated by dictionary.bulk.load is not counted towards Stardog's memory budget and is
outside the limits of JVM memory settings such as -XX:MaxDirectMemorySize or -Xmx. The maximum memory allocation is
restricted by the physical machine and operating system limits, not Stardog's configuration.
When using this option, keep the following in mind:
256MB.default or bulk_load).
You do not need to run in bulk_load memory mode to benefit from this setting.db create operation using this option allocates its own
separate cache. The caches are not shared between operations, which means multiple simultaneous database
creations will multiply the memory usage.db create operation completes.Set conservative values initially (e.g., 1GB) and test the performance improvements. While larger values may provide faster loading, the risk of memory exhaustion increases, particularly if databases are created concurrently.
For most use cases where database creation happens sequentially or infrequently, even modest values like 1GB can provide significant performance improvements without substantial risk.