SPARQL Update and Locks

Blog

SPARQL Update and Locks

  • 28 October, 2021
  • By Dave Cassel
  • No Comments
blog-image

I just learned something the hard way, so I thought I’d share.

The tl;dr is that sem.sparqlUpdate runs in a separate transaction by default, which means you need to be careful about document locks. (If your response is “well, duh”, then you may not need the rest of this post. If you’ve ever had a sem.sparqlUpdate request time out when it should return quickly, read on.)

Quick refresher: all requests in MarkLogic run as either a query or an update. When a request runs as a query, it runs at a particular timestamp. Thanks to the magic of MVCC, this means that the request does not need to acquire read locks on the documents it gets data from.

An update, on the other hand, will grab read and write locks. I’ll borrow from MarkLogic’s documentation here:

Read-locks block for write-locks and write-locks block for both read- and write-locks. An update has to obtain a read-lock before reading a document and a write-lock before changing (adding, deleting, modifying) a document. Lock acquisition is ordered, first-come first-served, and locks are released automatically at the end of the update request. MARKLOGIC CONCEPTS GUIDE; DATA MANAGEMENT CHAPTER

So when an update reads a document, it gets a write lock, which prevents any other update from changing the document during the first request. A read lock may be promoted to a write lock if a request first reads and then updates a document:

declareUpdate();
let doc = cts.doc('/test/dave.json'); // read lock on uri
let docObj = doc.toObject();
docObj.updatedBy = "me";
xdmp.nodeReplace(doc, docObj); // write lock on uri

So far, so good. Here’s what I ran into: I have a set of triples where the subject is a URL and the object is a timestamp. I want to delete all of triples having this predicate that have anything other than the latest timestamp. I broke that into two pieces: 1) find that latest timestamp; 2) delete any triples that have a different timestamp. I’m doing this in a single JavaScript request.

declareUpdate();
let maxDTS = fn.head(sem.sparql(
  `
    select (MAX(?dts) as ?maxdts) 
    where { 
      GRAPH <mygraph> { 
        ?url <http://4VServices.com/seenOn> ?dts 
      }
    }
  ` )).maxdts; sem.sparqlUpdate( ` WITH <http://4VServices.com/blog> DELETE { ?url <http://4VServices.com/blog/seenOn> ?dts . } WHERE { ?url <http://4VServices.com/blog/seenOn> ?dts . FILTER (?dts != ?recentDTS) } `, { "recentDTS": maxDTS } )

First, if someone sees a way to write better SPARQL here and do it all in one request, let me know.

When I ran this code, I was rewarded with a spinner that sat until the request timed out. No good. Why would that be? The initial query will get read locks on the documents that hold the triples, but the update will surely promote those to write locks and do the deletes. Right?

Nope.

I found the key piece of information in the sem.sparqlUpdate documentation.
“isolation=ISOLATION_LEVEL” ISOLATION_LEVEL can be different-transaction or same-statement. Default is different-transaction….
The sem.sparqlUpdate call runs (by default) in a different transaction. Since I was running my parent request as an update, it got read locks for the documents that hold the triples. The update attempt then tried to get write locks for the same documents, but couldn’t because the parent request hadn’t finished yet, so it hadn’t released those read locks.

Solution

The solution in my case was to remove the declareUpdate() from the parent request. The parent then ran as a query, which doesn’t take any locks, so the sem.sparqlUpdate was able to get the write locks it needed. I could also have had the update run with the isolation=same-statement option, which allows the read locks to be promoted to write locks.

Share this post:

quote
I just learned something the hard way, so I thought I’d share. The tl;dr is that sem.sparqlUpdate runs in a separate transaction by default,...

4V Services works with development teams to boost their knowledge and capabilities. Contact us today to talk about how we can help you succeed!

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
cta-bg

Partnering for Success on Data Projects

We work with companies like yours to improve business operations through better data management. Our role is to put you in a position to succeed. Let's talk about your goals and a plan to get you there.