MarkLogic Universal and Range Indexes


MarkLogic Universal and Range Indexes

  • 28 February, 2020
  • By Dave Cassel
  • No Comments

In NiFi, FlowFiles are pieces of data that a processor needs to work on. In this case, NiFi is calling a MarkLogic query for each FlowFile. The metric shown in the graph is FlowFiles processed by NiFi over 5 minute intervals. There are peaks around 16,000, but mostly lower, which wasn’t the throughput I needed. A one-line change and throughput went to about 80,000 over 5 minutes. The pace was still accelerating when it ran out of data to process, so I’m not sure how how it would have gone. So what did I change?

MarkLogic provides a number of indexes to improve query performance. Among them are the Universal Index (with many options) and Range Indexes.

Two types of queries look like they do very similar things, but they rely on these different indexes: cts:element-value-query() and cts:element-range-query() (with the “=” operator).

Here’s the original version of the function:

(: Given a URI, check whether there is a corresponding Item. 
 : If there is, delete it. 
declare function lib:delete-replaced-item($uri as xs:string)
  let $item-uri :=
      ("limit=1", "score-zero"),
    if (fn:exists($item-uri)) then
        function() { xdmp:document-delete($item-uri) }
    else ()

Note the cts:element-value-query. This query uses the Universal Index, which captures every term, along with the XML or JSON structure. This makes for rapid lookups of which documents have a particular term.

The cts:element-value-query looks for documents that have the provided input as the entire contents of the target element. In certain cases, this works great. However, in the example above, the return value from lib:id-from-uri($uri) looks something like “a~b~c”. The problem is the “~” characters. As MarkLogic tokenizes the content, it sees “a~b~c” as the sequence of tokens (“a”, “b”, “c”). We can see this using the xdmp:plan function on cts:element-value-query(xs:QName("id"), "a~b~c"). The results include a final-plan element:

    <qry:term-query weight="1">
</qry:final-plan> This query needs to look for not just one value, but three, in the correct order. Let’s compare that with the plan using cts:element-range-query(xs:QName("id"), "=", "a~b~c"):
    <qry:range-query weight="0" min-occurs="1" max-occurs="4294967295">
      <qry:lower-bound xsi:type="xs:string">a~b~c</qry:lower-bound>
      <qry:upper-bound xsi:type="xs:string">a~b~c</qry:upper-bound>

Here, MarkLogic is working with an upper and lower bound, which are the same value. To find results, MarkLogic will use the “id” range index, do a seek on the list of values to find the appropriate entry, and any matching URIs are found.

MarkLogic provides many different types of indexes. Knowing the right one to use for your query can make a huge difference in the performance.

Share this post:

In NiFi, FlowFiles are pieces of data that a processor needs to work on. In this case, NiFi is calling...

4V Services works with development teams to boost their knowledge and capabilities. Contact us today to talk about how we can help you succeed!

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments

Looking Forward to Building a Partnership!

Let's discuss how we can help your organization