This page discusses obfuscating datasets and queries in Stardog to prevent sharing sensitive data.
<details open markdown="block"> <summary> Page Contents </summary> 1. TOC </details>Data obfuscation can help by easily producing a dataset and query to:
The data obfuscate command is used obfuscate data. It is very similar to the data export command:
stardog data obfuscate -f TRIG -g ALL myDatabase /tmp/obfuscated.trig
myDatabase) to a file (/tmp/obfuscated.trig).Once the data is obfuscated, queries written against the original data will no longer work against the obfuscated data. The query obfuscate command is used to obfuscate queries.
# redirects obfuscated query to obfuscated.sparql
stardog query obfuscate myDatabase query.sparql > obfuscated.sparql
obfuscated.sparql) can then be executed against the database with the obfuscated data loaded into it.The following example shows how to obfuscate a small dataset and query.
Create a file sample.trig:
<http://example.com/graph> {
<http://example.com/test/1> <http://example.com/attribute/name> "t1" .
<http://example.com/test/1> <http://example.com/attribute/id> "0001" .
<http://example.com/test/2> <http://example.com/attribute/name> "t1" .
<http://example.com/test/2> <http://example.com/attribute/id> "0002" .
}
Create a database (myDatabase) and load sample.trig to it:
stardog-admin db create -n myDatabase sample.trig
Create a file (query.sparql) that contains the following SPARQL query:
SELECT ?name
FROM <http://example.com/graph>
WHERE {
<http://example.com/test/1> <http://example.com/attribute/name> ?name
}
Execute the query:
stardog query execute myDatabase query.sparql
+-------+
| name |
+-------+
| "t1" |
+-------+
stardog data obfuscate -f TRIG -g ALL myDatabase obfuscated.trig
<div class="code-example" markdown="1">
obfuscated.trig
@prefix obf: <tag:stardog:api:obf:> .
obf:19b37d9ffd391cd0e29ad7a0c92722e1190ab546213370807320d5e351d10b79 {
obf:d27330fb53d3f2a6d5068ce46d248392ec09f93692a8e36a7bd33a3c128dafd4
obf:64b72b77e8949f32a09d38590b15e7a757a1db3bc0a186405cc4e83141b54e2a "628b49d96dcde97a430dd4f597705899e09a968f793491e4b704cae33a40dc02" ;
obf:396dc63bd5eb69cb7a1283567c9eadd91d1319555faac5e6644314b0cf0c150d "888b19a43b151683c87895f6211d9f8640f97bdc8ef32f03dbe057c8f5e56d32" .
obf:ac6d2bb543a71a5ec4ac0a591b8c0aafa9ad54adf0d1d686f00fabc6799ddcf9
obf:64b72b77e8949f32a09d38590b15e7a757a1db3bc0a186405cc4e83141b54e2a "628b49d96dcde97a430dd4f597705899e09a968f793491e4b704cae33a40dc02" ;
obf:396dc63bd5eb69cb7a1283567c9eadd91d1319555faac5e6644314b0cf0c150d "4fac6dbe26e823ed6edf999c63fab3507119cf3cbfb56036511aa62e258c35b4" .
}
</div>
stardog query obfuscate myDatabase query.sparql > obfuscated.sparql
<div class="code-example" markdown="1">
obfuscated.sparql
SELECT ?x0
FROM <tag:stardog:api:obf:19b37d9ffd391cd0e29ad7a0c92722e1190ab546213370807320d5e351d10b79>
WHERE {
<tag:stardog:api:obf:d27330fb53d3f2a6d5068ce46d248392ec09f93692a8e36a7bd33a3c128dafd4>
<tag:stardog:api:obf:64b72b77e8949f32a09d38590b15e7a757a1db3bc0a186405cc4e83141b54e2a> ?x0 .
}
</div>
Create a new database (ObfuscatedDatabase) with exported obfuscated data contained in obfuscated.trig:
stardog-admin db create -n ObfuscatedDatabase obfuscated.trig
Execute the obfuscated query (obfuscated.sparql) against ObfuscatedDatabase:
stardog query execute myObfuscatedDB obfuscated.sparql
+--------------------------------------------------------------------+
| x0 |
+--------------------------------------------------------------------+
| "628b49d96dcde97a430dd4f597705899e09a968f793491e4b704cae33a40dc02" |
+--------------------------------------------------------------------+
By default, all URIs, bnodes, and string literals in the database will be obfuscated using the SHA256 message digest algorithm. Non-string typed literals (numbers, dates, etc.) are left unchanged as well as URIs from built-in namespaces (e.g. RDF, RDFS, OWL, etc.). It’s possible to customize obfuscation by providing a configuration file.
stardog data obfuscate --config obfuscation.ttl myDatabase obfDatabase.ttl
See an example obfuscation configuration file in the stardog-examples Github repository.
If a custom configuration file is used to obfuscate the data, then the same configuration should be used for obfuscating the queries as well.
stardog query obfuscate --config obfuscation.ttl myDatabase myQuery.sparql > obfQuery.ttl
To change the message digest algorithm used to obfuscate the data (from a default of SHA256), include the following in your obfuscation configuration file:
# Obfuscation namespace is used only for parsing the config file
@prefix obf: <tag:stardog:api:obf:> .
[] a obf:Obfuscation ;
# Message digest algorithm that will be used to obfuscate terms
# Should be a message digest algorithm supported by Java
obf:digest "MD5" ;
The configuration file specifies which URIs and strings will be obfuscated by defining inclusion and exclusion filters
any, subject, predicate, object].
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
# Obfuscation namespace is used only for parsing the config file
@prefix obf: <tag:stardog:api:obf:> .
[] a obf:Obfuscation ;
obf:include [
obf:position obf:any ;
obf:pattern "math" #default is .*, to include everything
] ;
obf:exclude [
obf:position obf:any ;
obf:namespace "rdf"
] ;