Blog
SPARQL Update and Locks
- 28 October, 2021
- By Dave Cassel
- No Comments
I just learned something the hard way, so I thought I’d share.
The tl;dr is that sem.sparqlUpdate
runs in a separate transaction by default, which means you need to be careful about document locks. (If your response is “well, duh”, then you may not need the rest of this post. If you’ve ever had a sem.sparqlUpdate
request time out when it should return quickly, read on.)
Quick refresher: all requests in MarkLogic run as either a query or an update. When a request runs as a query, it runs at a particular timestamp. Thanks to the magic of MVCC, this means that the request does not need to acquire read locks on the documents it gets data from.
An update, on the other hand, will grab read and write locks. I’ll borrow from MarkLogic’s documentation here:
Read-locks block for write-locks and write-locks block for both read- and write-locks. An update has to obtain a read-lock before reading a document and a write-lock before changing (adding, deleting, modifying) a document. Lock acquisition is ordered, first-come first-served, and locks are released automatically at the end of the update request. MARKLOGIC CONCEPTS GUIDE; DATA MANAGEMENT CHAPTER
So when an update reads a document, it gets a write lock, which prevents any other update from changing the document during the first request. A read lock may be promoted to a write lock if a request first reads and then updates a document:
declareUpdate(); let doc = cts.doc('/test/dave.json'); // read lock on uri let docObj = doc.toObject(); docObj.updatedBy = "me"; xdmp.nodeReplace(doc, docObj); // write lock on uri
So far, so good. Here’s what I ran into: I have a set of triples where the subject is a URL and the object is a timestamp. I want to delete all of triples having this predicate that have anything other than the latest timestamp. I broke that into two pieces: 1) find that latest timestamp; 2) delete any triples that have a different timestamp. I’m doing this in a single JavaScript request.
declareUpdate();
let maxDTS = fn.head(sem.sparql(
`
select (MAX(?dts) as ?maxdts)
where {
GRAPH <mygraph> {
?url <http://4VServices.com/seenOn> ?dts
}
}
` )).maxdts; sem.sparqlUpdate( ` WITH <http://4VServices.com/blog> DELETE { ?url <http://4VServices.com/blog/seenOn> ?dts . } WHERE { ?url <http://4VServices.com/blog/seenOn> ?dts . FILTER (?dts != ?recentDTS) } `, { "recentDTS": maxDTS } )
First, if someone sees a way to write better SPARQL here and do it all in one request, let me know.
When I ran this code, I was rewarded with a spinner that sat until the request timed out. No good. Why would that be? The initial query will get read locks on the documents that hold the triples, but the update will surely promote those to write locks and do the deletes. Right?
Nope.
I found the key piece of information in thesem.sparqlUpdate
documentation.
“isolation=ISOLATION_LEVEL” ISOLATION_LEVEL can be different-transaction or same-statement. Default is different-transaction….The
sem.sparqlUpdate
call runs (by default) in a different transaction. Since I was running my parent request as an update, it got read locks for the documents that hold the triples. The update attempt then tried to get write locks for the same documents, but couldn’t because the parent request hadn’t finished yet, so it hadn’t released those read locks.
Solution
The solution in my case was to remove thedeclareUpdate()
from the parent request. The parent then ran as a query, which doesn’t take any locks, so the sem.sparqlUpdate
was able to get the write locks it needed. I could also have had the update run with the isolation=same-statement
option, which allows the read locks to be promoted to write locks.
Share this post:
4V Services works with development teams to boost their knowledge and capabilities. Contact us today to talk about how we can help you succeed!
I just learned something the hard way, so I thought I’d share. The tl;dr is that sem.sparqlUpdate runs in a separate transaction by default,...