HomeGetting StartedInstallation & SetupDevelopment & IntegrationDeployment & OperationsData ManagementTechnical SupportPlatform Updates
DocsDeployment & OperationsOperating Stardogserver monitoring

Server Monitoring

This page discusses how to monitor the Stardog server.

<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>

Overview

Stardog provides server monitoring via the Metrics library. In addition to providing some basic JVM information, Stardog also exports information about the Stardog DBMS configuration, as well as stats for all databases within the system (e.g., the total number of open connections, database size, and average query time).

Accessing Monitoring Information

Monitoring information is available via the Java API, the HTTP API, the CLI, or (if configured) the JMX interface.

Performing a GET on the /admin/status endpoint will return a JSON object containing all the information available about the server and its databases.

$ curl -u admin:admin "http://localhost:5820/admin/status/"

The stardog-admin server status CLI command will print a subset of this information to the console.

The endpoint /{yourDatabaseName}/status endpoint will return the monitoring information about that database's status.

$ curl -u admin:admin "http://localhost:5820/{yourDatabaseName}/status/"

Prometheus

Monitoring information is also available for Prometheus via the /admin/status/prometheus endpoint, allowing Prometheus servers to scrape Stardog directly. This endpoint requires authentication.

In some environments, it can be advantageous to scrape metrics from an unauthenticated endpoint. The /admin/status/prometheus/internal endpoint allows a service from a private address space to scrape the metrics without Stardog authentication. By default, it is restricted to connections from 127.0.0.1/32. The allowed CIDR can be changed via stardog.properties with the prometheus.allowCIDR option.

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus/"

READ permission on dbms-admin:metrics is required to access both the /admin/status and /admin/status/prometheus endpoints in order to consume server metrics information. This is to prevent giving access to sensitive information in server metrics to any unauthenticated users. If the user does not have this permission, admin/status will only return a few metrics, such as the version of the server.

Prometheus Metrics Filters

The Prometheus API offers ways to limit the number of metrics returned. This may be achieved by supplying a regex either in the stardog.properties or directly via a query parameter. The API endpoint supports the query parameters include and exclude, which can be configured in the scrape_config section of the Prometheus configuration file. stardog.properties has corresponding config options named metrics.prometheus.include and metrics.prometheus.exclude. When both parameters are supplied, the exclusion will win. Query parameters take precedence over the configuration option.

Filter examples

Get all metrics starting with dbms_ and com_ with the regex ^(dbms|com)_.*:

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus?include=dbms_.%2A&include=%5E%28dbms_%7Ccom%29_.%2A"

Or add the following line to stardog.properties:

metrics.prometheus.include=^(dbms|com)_.*

Exclude database specific metrics and kga metrics with the regex ^(databases|kga)_.*:

$ curl -u admin:admin "http://localhost:5820/admin/status/prometheus?exclude=%5E%28databases%7Ckga%29_.%2A"

Or add the following line to stardog.properties:

metrics.prometheus.exclude=^(databases|kga)_.*

JMX Monitoring

By default, JMX monitoring is not enabled. You can enable it by setting metrics.reporter=jmx in the stardog.properties file. Then, you can use a tool like VisualVM or JConsole to attach to the process running the JVM, or connect directly to the JMX server.

If you want to connect to the JMX server remotely, you need to set metrics.jmx.remote.access=true in stardog.properties. Stardog will bind an RMI server for remote access on port 5833. If you want to change the port Stardog binds the remote server to, you can set the property metrics.jmx.port in stardog.properties.

Disabling Monitoring

If you wish to disable monitoring completely, set metrics.enabled to false in stardog.properties.

Knowledge Graph Metrics

These metrics focus on measuring a few key aspects of the knowledge graph.

Metric NameTypeUnitDescription
kga.YourDb.cnlongcountThe number of "Connected Nodes" in the Knowledge Graph, which is the number of nodes with outgoing edges
kga.YourDb.ce.YourClass longcountThe number of entities of a particular class
kga.YourDb.takelongcountThe number of edges in the Knowledge Graph
kga.YourDb.reach.cardinalitylongcountThe number of edges needed to answer all queries in the last 1 hr
kga.YourDb.reach.accuracystringenumThe estimated accuracy of the current reach cardinality
kga.YourDb.reach.ratedoublecountThe number of edges per second used to answer queries in the last 1 hr
kga.YourDb.reach.histogramhistogram-The histogram of reach measurements for the last 1 hr

Instance Wide Metrics

These metrics are written out for the entire Stardog instance and capture triple counts for each virtual graph and the total triple count. For the purposes of this document, virtual graph IRIs such as virtual://name consist of UriScheme (virtual) and VirtualGraphName (name).

Metric NameTypeUnitDescription
kga.totalTriplesdoublecountTotal number of triples in all virtual graphs
kga.UriScheme.VirtualGraphName.triplesdoublecountNumber of triples in the given virtual graph

Block Cache Metrics

These are metrics for the 3 global block caches: Data, Dictionary, and TXN. Each Block cache is global and shared by all databases simultaneously. There are three distinct caches with distinct purposes, but each has their own set of metrics.

  1. The Data Cache stores data from indices.
  2. The Dictionary Cache stores entries from dictionary mappings.
  3. The Txn Cache stores transaction entries. This speeds up access to transaction metadata.

For convenience, the metrics system rolls up statistics for all three Block Caches into a "total" metrics. Thus, we have four prefix forms:

  1. dbms.memory.blockcache.data
  2. dbms.memory.blockcache.dictionary
  3. dbms.memory.blockcache.txn
  4. dbms.memory.blockcache.total

Since all metrics have the same definition for each time, we will only list the metrics once, using the form dbms.memory.blockcache.CACHE.<metric>, where CACHE can be data,dictionary,txn, or total.

Each Block cache has three internal components:

  1. The "data" component is where actual bytes from files are stored.
  2. The "index" component is where file indices are stored.
  3. The "filter" component is where bloom filters are stored.

If storage.cacheIndexBlocks, storage.cacheDictionaryIndexBlocks, and storage.cacheTxnIndexBlocks are false, then the "index" and "filter" sections will not be populated for that block cache and will instead have all zeros.

Metric NameTypeUnitDescription
dbms.memory.blockcache.CACHE.ratiodoublepercentageThe percentage of cache requests that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.hitslongcountThe number of cache requests that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.misseslongcountThe number of cache requests that could not be served by the cache since the process started
dbms.memory.blockcache.CACHE.add.countlongcountThe number of entries that were added to the cache since the process started
dbms.memory.blockcache.CACHE.add.failure.countlongcountThe number of times adding to the cache failed, for any reason, since the process started
dbms.memory.blockcache.CACHE.index.ratiodoublepercentageThe percentage of index requests to the cache that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.index.hitslongcountThe number of index requests to the cache that were successfully served by the cache directly since the process started
dbms.memory.blockcache.CACHE.index.misseslongcountThe number of index requests to the cache that did not find any data since the process started
dbms.memory.blockcache.CACHE.filter.ratiodoublepercentageThe percentage of filter requests to the cache that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.filter.hitslongcountThe number of filter requests to the cache that were successfully served by the cache directly since the process started
dbms.memory.blockcache.CACHE.filter.misseslongcountThe number of filter requests to the cache that did not find any data since the process started
dbms.memory.blockcache.CACHE.data.ratiodoublepercentageThe percentage of data requests to the cache that were served by the cache directly since the process started
dbms.memory.blockcache.CACHE.data.hitslongcountThe number of data requests to the cache that were successfully served by the cache directly since the process started
dbms.memory.blockcache.CACHE.data.misseslongcountThe number of data requests to the cache that did not find any data since the process started
dbms.memory.blockcache.CACHE.readlongbytesAmount of data read from the cache since the process started
dbms.memory.blockcache.CACHE.writtenlongbytesThe amount of data written to the cache since the process started
dbms.memory.blockcache.CACHE.cachesIndexBlocksBooleanN/AIf true, the cache will store index and filter blocks
dbms.memory.blockcache.CACHE.strictCapacityBooleanN/AIf true, the cache will throw an error if there is no more room in the cache. When false, the cache will be allowed to "soft" grow past the capacity limit temporarily in the event of high contention
dbms.memory.blockcache.CACHE.usagelongbytesThe amount of memory currently being used for this block cache
dbms.memory.blockcache.CACHE.pinnedUsagelongbytesThe amount of memory in the block cache currently in use (i.e. by readers)
dbms.memory.blockcache.CACHE.capacitylongbytesThe maximum amount of memory that can be used for this block cache

Database Metrics

These metrics are written out once per database and are written in the form of databases.*. In JMX, these metrics are collected in a separate folder. For reference purposes, metrics in this table will use YourDb as the database name.

Metric NameTypeUnitDescription
databases.YourDb.stateStringN/AA String describing the current state of the database. Can be one of Online, GoingOffline, Offline, ComingOnline, Disabled
databases.YourDb.sizelongcountAn estimate of the number of quads contained in the database. This number may be inaccurate in mastiff, due to transactional considerations, and should be treated only as an estimate
databases.YourDb.openConnectionslongcountThe current number of open connections to this database
databases.YourDb.txns.openTransactionslongcountThe current number of open transactions on this database
databases.YourDb.txns.latency.countlongcountThe number of transactions that were recorded
databases.YourDb.txns.latency.duration_unitsStringN/AThe units that duration is measured in (usually seconds)
databases.YourDb.txns.latency.maxdoubletimeThe highest latency transaction measured since the database was created or the process started
databases.YourDb.txns.latency.meandoubletimeThe overall average latency of a transaction since the database was created or the process started
databases.YourDb.txns.latency.stddevdoubletimeThe standard deviation latency of a transaction since the database was created or the process started
databases.YourDb.txns.latency.mindoubletimeThe slowest transaction speed measured since the database was created or the process started
databases.YourDb.txns.latency.p50doubletimeThe 50th percentile transaction latency (50% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p75doubletimeThe 75th percentile transaction latency (75% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p95doubletimeThe 95th percentile transaction latency (95% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p98doubletimeThe 98th percentile transaction latency (98% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p99doubletimeThe 99th percentile transaction latency (99% of all transactions have lower latency than this)
databases.YourDb.txns.latency.p999doubletimeThe 99.9th percentile transaction latency (99.9% of all transactions have lower latency than this )
databases.YourDb.txns.latency.mean_ratedoubleRateThe overall average throughput of transactions since the database was created or the process started
databases.YourDb.txns.latency.m15_ratedoubleRateThe 15-minute exponentially-weighted moving average throughput of transactions per unit time
databases.YourDb.txns.latency.m1_ratedoubleRateThe 1-minute exponentially-weighted moving average throughput of transactions per unit time
databases.YourDb.txns.latency.m5_ratedoubleRateThe 5-minute exponentially-weighted moving average throughput of transactions per unit time
databases.YourDb.txns.latency.rate_unitsStringN/AThe configured units to use when measuring transaction throughput (usually in calls/unit time, where calls = 'transactions')
databases.YourDb.txns.size.countlongcountThe number of transactions that were measured
databases.YourDb.txns.size.maxlongcountThe largest transaction size measured
databases.YourDb.txns.size.meandoublecountThe average transaction size, since the database was created or the process started
databases.YourDb.txns.size.stddevdoublecountThe standard deviation in transaction size, since the database was created or the process started
databases.YourDb.txns.size.mindoublecountThe smallest transaction size, since the database was created or the process started
databases.YourDb.txns.size.p50doublecountThe 50th percentile transaction size (50% of all transactions are smaller than this number)
databases.YourDb.txns.size.p75doublecountThe 85th percentile transaction size (75% of all transactions are smaller than this number)
databases.YourDb.txns.size.p95doublecountThe 95th percentile transaction size (95% of all transactions are smaller than this number)
databases.YourDb.txns.size.p98doublecountThe 98th percentile transaction size (98% of all transactions are smaller than this number)
databases.YourDb.txns.size.p99doublecountThe 99th percentile transaction size (99% of all transactions are smaller than this number)
databases.YourDb.txns.size.p999doublecountThe 99.9th percentile transaction size (99.9% of all transactions are smaller than this number)
databases.YourDb.queries.latency.countlongcountThe number of queries that were measured since the database was created or the process started
databases.YourDb.queries.latency.duration_unitsStringN/AThe units that query latency is measured in (usually seconds)
databases.YourDb.queries.latency.maxdoubletimeThe highest latency query measured since the database was created or the process started
databases.YourDb.queries.latency.mindoubletimeThe lowest latency query measured since the database was created or the process started
databases.YourDb.queries.latency.meandoubletimeThe overall average latency of a query since the database was created or the process started
databases.YourDb.queries.latency.stddevdoubletimeThe standard deviation latency of a query since the database was created or the process started
databases.YourDb.queries.latency.p50doubletimeThe 50th percentile query latency (50% of all queries have lower latency than this)
databases.YourDb.queries.latency.p75doubletimeThe 75th percentile query latency (75% of all queries have lower latency than this)
databases.YourDb.queries.latency.p95doubletimeThe 95th percentile query latency (95% of all queries have lower latency than this)
databases.YourDb.queries.latency.p98doubletimeThe 98th percentile query latency (98% of all queries have lower latency than this)
databases.YourDb.queries.latency.p99doubletimeThe 99th percentile query latency (99% of all queries have lower latency than this)
databases.YourDb.queries.latency.p999doubletimeThe 99.9th percentile query latency (99.9% of all queries have lower latency than this )
databases.YourDb.queries.latency.mean_ratedoubleRateThe overall average throughput of queries since the database was created or the process started
databases.YourDb.queries.latency.m15_ratedoubleRateThe 15-minute exponentially-weighted moving average throughput of queries per unit time
databases.YourDb.queries.latency.m1_ratedoubleRateThe 1-minute exponentially-weighted moving average throughput of queries per unit time
databases.YourDb.queries.latency.m5_ratedoubleRateThe 5-minute exponentially-weighted moving average throughput of queries per unit time
databases.YourDb.queries.latency.rate_unitsStringN/AThe configured units to use when measuring query throughput (usually in calls/unit time, where calls = 'queries')
databases.YourDb.queries.runninglongcountThe number of currently running queries
databases.YourDb.planCache.ratiodoublecountThe hit ratio of the plan cache, as a percentage
databases.YourDb.planCache.sizedoublecountThe size of the plan cache, in entries
databases.YourDb.backgroundErrorslongcountThe number of errors that occur during compaction or flushing, asynchronously to user calls
databases.YourDb.files.totallongcountThe total number of files held in the database, over all indices
databases.YourDb.numKeyslongcountThe estimated number of quads in the database. Note that this number is not transactional, so deleted quads may still be counted. Also, it's an estimate, so it may not be very accurate to begin with

Per Index Metrics

These metrics are written out once per index within a database (i.e. SPOC, C, CPO, etc.). They are of the form databases.*. For the purposes of this document, we will use YourDb as the database name, and INAME as the index name.

There are 8 different kinds of indices in Stardog:

Index NameDescription
ternaryThe main index storing encoded data
dictionary.dictThe dictionary encoding table for the database
dictionary.valueThe dictionary decoding table for the database
statsThe statistics index
equalityThe equality index
binaryBinary count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)
unaryUnary count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)
contextContext count indices. These are only present when a database is created using Abort-on-conflict transaction semantics (disabled by default)

Table of Index Metrics:

Metric NameTypeUnitDescription
databases.YourDb.INAME.files.totallongcountThe total number of files currently held by this index on disk
databases.YourDb.INAME.flushes.pendinglongcountThe number of flushes currently pending on this index (no more than the max. number of configured memtables)
databases.YourDb.INAME.flushes.runninglongcountThe number of flushes currently running for this index (no more than the max. number of configured memtables)
databases.YourDb.INAME.liveDataSizelongcountThe estimated size of the "live" data for this index. "Live" data is data which will actively be processed by the read and write systems or by compaction (disregarding out of date files)
databases.YourDb.INAME.numKeyslongcountThe estimated number of keys in this index. For Ternary indices, this is a (rough) estimate of the number of quads in the database, for the dictionary, it's an estimate of how many statements are in the dictionary. Note that this not a transactional estimate: deleted entries are ignored, so this value will likely overcount in that scenario
databases.YourDb.INAME.numLevelsIntcountThe configured number of levels for this index. This is set by configuration, and won't change during the lifecycle of the process
databases.YourDb.INAME.backgroundErrorslongcountThe number of errors that were detected during background processing of this index since the process began
databases.YourDb.INAME.tableReaderMemory.byteslongcountThe amount of memory currently pinned in the OS to support active readers
databases.YourDb.INAME.memory.totallongcountThe estimated total memory used by this index, for all purposes, including memtable, reader memory, and block cache contributions
databases.YourDb.INAME.memtable.immutable.countlongcountThe number of memtables which are currently in the "immutable phase" (i.e. waiting to flush to disk). Can never be more than the configured maximum number of memtables
databases.YourDb.INAME.memtable.total.size.byteslongbytesThe current size of all memtables (active, inactive, and immutable), in bytes
databases.YourDb.INAME.memtable.unpinned.size.byteslongbytesThe current size of all unpinned memtables for this index. Unpinned memtables are memtables which are not currently pinned in memory for readers
databases.YourDb.INAME.memtable.pinned.size.byteslongbytesThe current size of all memtables which are pinned for reachers for this index
databases.YourDb.INAME.memtable.immutable.size.byteslongbytesThe current size of all immutable memtables (memtables waiting to flush)
databases.YourDb.INAME.memtable.immutable.entrieslongcountThe current number of entries in all immutable memtables
databases.YourDb.INAME.memtable.active.entrieslongcountThe current number of entries in the active memtable (the active memtable is the memtable currently accepting writes)
databases.YourDb.INAME.memtable.active.size.byteslongbytesThe current size of the active memtable
databases.YourDb.INAME.memtable.memtableStallslongcountThe total number of memtable stalls which have occurred since the process started or the database was created. Memtable stalls are where a flush is forced to wait for the number of L0 files to be reduced
databases.YourDb.INAME.memtable.memtableSlowdownslongcountThe total number of memtable slowdowns which have occurred since the process started or the database was created. Memtable slowdowns are when a flush is delayed in order to allow the L0 file count to be reduced
databases.YourDb.INAME.stallslongcountThe total number of stalls which have occurred on this index since the process started or the database was created. Stalls are when data cannot be accepted into a given level because it is full, and all writes must stop until that level has reduced its file count
databases.YourDb.INAME.slowdownslongcountThe total number of slowdowns which have occurred on this index since the process started or the database was created. Slowdowns are when the data must be delayed in order to allow compaction to reduce the file count to a give level
databases.YourDb.INAME.stalls.pendingCompactionlongcountThe current number of stalls which happened while a compaction was pending since the process started or the database was created
databases.YourDb.INAME.slowdowns.pendingCompactionlongcountThe current number of slowdowns which occured while a compaction was pending since the process started or the database was created
databases.YourDb.INAME.slowdowns.l0longcountThe total number of slowdowns which occurred because the number of files in the L0 level exceeded the soft limit, and writes must be delayed because of it.
databases.YourDb.INAME.stalls.l0longcountThe total number of stalls which occurred because the number of files in the L0 level exceeded the hard limit, and all writes must pause because of it
databases.YourDb.INAME.slowdowns.l0.withCompactionlongcountThe total number of slowdowns which occured in the L0 level while a compaction was currently running
databases.YourDb.INAME.stalls.l0.withCompactionlongcountThe total number of stalls which occurred in the L0 level while a compaction was currently running
databases.YourDb.INAME.numFilesCompactinglongcountThe current number of files compacting for this index
databases.YourDb.INAME.compactions.pendinglongcountThe current number of compactions which are waiting to run for this index
databases.YourDb.INAME.compactions.completedlongcountThe number of compactions which have completed for this index since the process began or the database was created
databases.YourDb.INAME.compactions.read.byteslongbytesThe number of bytes read during compaction since the process started or the database was created
databases.YourDb.INAME.compactions.written.byteslongbytesThe number of bytes written during compaction since the process started or the database was created
databases.YourDb.INAME.compaction.read.<br>throughput.bytesPerSecdoublebytes/secThe overall read throughput of compaction (off disk) for this index since the process started or the database was created
databases.YourDb.INAME.compaction.write.<br>throughput.bytesPerSecdoublebytes/secThe overall write throughput of compaction (to disk) for this index since the process started or the database was created
databases.YourDb.INAME.compaction.time.secdoublesecondsThe total time spent compacting files for the index since the process started or the database was created
databases.YourDb.INAME.compaction.time.avg.secdoublesecondsThe overall average time spent performing a compaction for this index since the process started or the database was created
databases.YourDb.INAME.compaction.keysProcessedlongcountThe number of keys which were processed during compaction
databases.YourDb.INAME.compaction.keysDroppedlongcountThe number of keys which were removed as part of the compaction process
databases.YourDb.INAME.compaction.memory.totallongcountThe total amount of memory currently being used to perform compactions for this index
databases.YourDb.INAME.compactions.runninglongcountThe total number of compactions currently running for this index
databases.YourDb.INAME.writeAmplificationdoubleratioThe ratio of bytes written to storage versus bytes written to the database. This is a guide to how many copies of the same data is presently on disk; for example, a write amplification of 3 means that you are writing roughly three times as much data to disk as you are writing entries to the index

HTTP Server Metrics

These metrics are used to monitor the HTTP subsystem. They are general to the process itself (since there is only one HTTP layer per process).

Metric NameTypeUnitDescription
admin.threads.activeIntegercountThe current number of active threads in the admin pool (equivalent to the number of admin-level operations occurring)
admin.threads.queuedIntegercountThe current number of admin-level operations which are queued up waiting for a thread to operate on them
admin.threads.sizeIntegercountThe maximum number of threads that admin-level operations can make use of.
user.threads.activeIntegercountThe current number of active threads in the user pool (equivalent to the number of user-level operations currently occurring)
user.threads.queuedIntegercountThe current number of user-level operations which are enqueued waiting for a thread. A high number here may indicate an overloaded server
user.threads.sizeIntegercountThe maximum number of threads that user-level operations can make use of.
com.stardog.http.server-.avgRequesttime.countlongcountThe number of HTTP requests that have been made since the process started, where is the HTTP port of the process
com.stardog.http.server-.avgRequesttime.maxdoublemillisecondsThe longest HTTP request that has been made since the process started
com.stardog.http.server-.avgRequesttime.meandoublemillisecondsThe average time taken to process an HTTP request since the process started
com.stardog.http.server-.avgRequesttime.stddevdoublemillisecondsThe standard deviation in time taken to process an HTTP request since the process started
com.stardog.http.server-.avgRequesttime.mindoublemillisecondsThe minimum time taken to process an HTTP request since the process started
com.stardog.http.server-.avgRequesttime.p50doublemillisecondsThe 50th percentile time taken to process an HTTP request since the process started (50% of all HTTP requests are shorter than this number)
com.stardog.http.server-.avgRequesttime.p75doublemillisecondsThe 75th percentile time taken to process an HTTP request since the process started (75% of all HTTP requests are shorter than this number)
com.stardog.http.server-.avgRequesttime.p95doublemillisecondsThe 95th percentile time taken to process an HTTP request since the process started (95% of all HTTP requests are shorter than this number)
com.stardog.http.server-.avgRequesttime.p98doublemillisecondsThe 98th percentile time taken to process an HTTP request since the process started (98% of all HTTP requests are shorter than this number )
com.stardog.http.server-.avgRequesttime.p99doublemillisecondsThe 99th percentile time taken to process an HTTP request since the process started (99% of all HTTP requests are shorter than this number )
com.stardog.http.server-.avgRequesttime.p999doublemillisecondsThe 99.9th percentile time taken to process an HTTP request since the process started (99.9% of all HTTP requests are shorter than this number )
com.stardog.http.server-.currentRequestslongcountThe current number of open HTTP requests

Memory Usage Metrics

Memory Management Metrics

The Memory Management subsystem is responsible for efficiently managing Stardog's internal memory usage, especially during query answering. Memory is broken down into a set of reusable memory "blocks".

Metric NameTypeUnitDescription
dbms.memory.heap.query.blocks.usedlongbytesThe amount of Java heap which is currently being used by query blocks in the memory management system
dbms.memory.heap.query.blocks.maxlongbytesThe maximum amount of Java heap which is devoted to use by query blocks
dbms.memory.native.query.blocks.usedlongbytesThe amount of native (off-heap) memory which is currently being used by query blocks in the memory management system
dbms.memory.native.query.blocks.maxlongbytesThe maximum amount of native (off-heap) memory which is devoted to use by query blocks
databases.YourDb.queries.memory.spilledlongbytesThe monotonically increasing number of bytes spilled over to disk during evaluation of queries against the given database
databases.YourDb.queries.memory.acquiredlongbytesThe monotonically increasing number of bytes acquired for processing intermediate results for queries against the given database

Java Memory Metrics

These are metrics about (or related to) the JVM's memory usage. They are usually accessible through other JVM tools (like JMX) but are provided as explicit metrics for end-user convenience.

Metric NameTypeUnitDescription
dbms.memory.heap.usedlongbytesThe amount of memory currently being used by the Java heap
dbms.memory.heap.maxlongbytesThe maximum amount of memory allowed for the Java heap. Equivalent to -Xmx settings
dbms.memory.mapped.usedlongbytesThe amount of memory currently in use for memory-mapped buffers in the Java subsystem. Note that this does not include any memory-mapped usage from native sources (such as RocksDB)
dbms.memory.direct.buffer.usedlongbytesThe amount of off-heap memory currently being used by Java buffers which are managed by the JVM. Note that this does not include memory buffers which are created inside of native code
dbms.memory.native.maxlongbytesThe maximum amount of native memory that the process is allowed to use outside of the JVM. This includes any buffers that are natively created by populated inside the JVM, as well as any memory which is natively allocated (like RocksDB)

Thread dumps for the server can be retrieved with the metric jvm.threads, but only if the threads parameter is set to true in the HTTP request. Using the --threads option in the stardog-admin server metrics CLI command will achieve this. This capability is useful as an alternative to jstack, as it does not require login access to the server.

When metrics.jvm.enabled is set to true in stardog.properties, Stardog additionally reports a set of JVM metrics. They have the following prefixes:

  • jvm.gc.* for GC related metrics
  • jvm.memory.* for JVM heap related metrics
  • jvm.memory.buffers.* for JVM metrics related to use of memory buffers

Process Memory Metrics

These are metrics about the process itself, ignoring the JVM. These are almost always accessible through other means (such as ps on Linux systems) but are provided as metrics within Stardog both for end-user convenience and for automatic management (such as warning when memory usage exceeds a threshold).

Metric NameTypeUnitDescription
dbms.memory.system.rsslongbytesThe current OS-reported RSS (Resident Set Size) for this process. For more information on RSS, see this article
dbms.memory.system.rss.peaklongbytesThe OS-reported maximum RSS achevied by this process since it started
dbms.memory.system.virtuallongbytesThe current OS-reported Virtual memory size for this process. Note that a large virtual size does not automatically equate to a large actual memory usage. For more information see this StackOverflow description
dbms.memory.system.regioncountlongCountThe current OS-reported number of regions in use by this process. This number only applies to operating systems which have a region-based memory system, like OS X (but not Linux or Windows). For operating systems which does not use regional memory, this number will be set to 1
dbms.memory.system.pinnedSizelongbytesThe current amount of memory which is "pinned" by the operating system, and cannot be swapped out by the process. Note that only some operating systems support this; operating systems which do not support the metric will always report -1 for this value
dbms.memory.system.pageSizelongbytesThe size of a single memory page in the OS
dbms.memory.system.usageRatiolongpercentageThe ratio of currently used memory to the total amount available to the process

Process Metrics

Process metrics are metrics that are unique to the Stardog process currently running and its environment. They contain information about the process itself without referencing any specific database.

Metric NameTypeUnitDescription
dbms.versionStringN/AThe release version of the server
dbms.typeStringN/AThe type of license in effect for the server. Can be one of: Community, Developer, Enterprise
dbms.idStringN/AThe id of the kernel. This is a unique identifier for the specific Stardog process. In non-clustered environments, this is just a random ID which is not persisted across restarts. In clustered environments, the kernel id is constructed from configuration and IP addresses to allow for unique identity within a cluster
dbms.homeStringN/AThe full path to the home directory of this running process (i.e. $STARDOG_HOME)
system.uptimelongmillisecondsThe amount of time since the process started
system.osStringN/AAn identifier for the operating system that Stardog is running on
system.archStringN/AAn identifier of the specific architecture that Stardog is running on
system.cpu.usagedoublepercentageThe percentage of available system CPUs that are being used for the Stardog process. Calculated as the total CPU cycles used by the process (as reported by the Operating System) divided by the number of processors available
dbms.credentials.cache.sizelongcountThe approximate number of entries in the security cache
dbms.credentials.cache.hitslongcountThe number of cache hits in the security cache
dbms.credentials.cache.misseslongcountThe number of cache misses in the security cache
dbms.credentials.cache.loadSuccesseslongcountThe number of times a cache miss resulted in successfully loading a value from the underlying cache storage system since this process started
dbms.credentials.cache.loadFailureslongcountThe number of times a load into the security cache failed, for any reason, since the process started
dbms.credentials.cache.evictionslongcountThe number of entries which have been evicted from the security cache since the process started
system.db.countlongcountThe number of databases stored in Stardog