Nesting/overlapping ranges and match groups

The markRanges() method with wrapAllRanges option, can mark nesting/overlapping ranges. With this option, all ranges that have indexes within 0 and context length be wrapped.

The markRegExp() method with RegExp having the d flag, with separateGroups and wrapAllRanges options can mark: nesting groups, capturing groups inside positive lookaround assertions. It practically removes all restrictions.

The lookaround examples demonstrate cases when wrapAllRanges option should be used, otherwise they won't be correctly highlighted:

  • RegExp with lookaround assertions can create overlapping matches.
    e.g. regex /(?<=(gr1)\s+\w+\b).+?(gr2)/dg, string 'gr1 match1 gr1 gr2 match2 gr2'.
    The gr1 from the second match not wrapped because the gr2 from the first match is already wrapped.

  • Another case: regex /(?=\d*(1))(?=\d*(2))(?=\d*(3))/dg, matches '123, 132, 213, 231, 312, 321'.
    This is not an overlapping case, but groups are wrapped in any order. If group 1 is wrapped first, the 2 and 3 are ignored in '231, 321' ...

  • Groups overlapping case: regex /\w+(?=.*?(gr1 \w+))(?=.*?(\w+ gr2))/dg , string 'word gr1 overlap gr2' - the gr1 is wrapped, the gr2 is ignored.

Note: the wrapAllRanges option can cause performance degradation if the context contains a very large number of text nodes and mark elements. This is because with each wrapping, two more objects are inserted into the array, which require a lot of copying, memory allocation ...

The 8MB file containing 177000 text nodes:

option marked groups 2500 marked groups 29000
wrapAllRanges: true 0.7 sec. 2.9 sec.
wrapAllRanges: false 0.65 sec. 0.7 sec.

The 1MB file containing 20800 text nodes:

option marked groups 2500 marked groups 29000
wrapAllRanges: true 120 ms. 710 ms.
wrapAllRanges: false 70 ms. 310 ms.

Note: wrapAllRanges option with d flag wraps all capturing groups regardless of nested level. You need to filter out unwanted groups.
Without this option - if a group has been wrapped, all nested groups are ignored.

To mark nesting/overlapping ranges.

const ranges = [{ start: 0, length: 50 }, { start: 10, length: 20, nested: true }, ..];

instance.markRanges(ranges, {
  'wrapAllRanges' : true,
  'each' : (markElement, range) => {
    // to distinguish ranges you can add some property to ranges
    if (range.nested) {
      markElement.className = 'nested';
    }
  }
});

To mark nesting groups with acrossElements option and d flag.

instance.markRegExp(/\w+\s((nested group)\s+\w+)/dg, {
    'acrossElements' : true,
    'separateGroups' : true,
    'wrapAllRanges' : true,
    'each' : (markElement, info) => {
      if (info.groupIndex === 2) {
          markElement.className = 'nested';
      }
    }
});

To mark nesting groups with acrossElements option and RegExp without d flag

It treats the whole match as a group 0, and all child groups, in this case 'group1, group2', as nested ones. It's an only way to wrap nested groups without `d` flag:
let regex = /\w+\s(group1).+?(group2).*/gi;

instance.markRegExp(regex, {
    'acrossElements' : true,
    'separateGroups' : true,
    'wrapAllRanges' : true,
    'each' : (markElement, info) => {
        if (info.groupIndex === 0) {
            markElement.className = 'main-group';
        }
        if (info.groupIndex > 0) {
            markElement.className = 'nested-group';
        }
    }
});

Simple example with next/previous buttons.

It uses numbers as unique match identifiers in continuous ascending order. The code example with next/previous buttons which uses 'start elements' doesn't work correctly with nesting/overlapping matches.

let currentIndex = 0,
    matchCount,
    marks,
    // highlight 3 words in sentences in any order, e.g. 'word word2 word word3 word word1.'
    regex = /(?=[^.]*?(word1))(?=[^.]*?(word2))(?=[^.]*?(word3))/dgi;
    
instance.markRegExp(regex, {
    'acrossElements' : true,
    'separateGroups' : true,
    'wrapAllRanges' : true,
    'each' : (markElement, info) => {
        // info.count as a match identifier
        markElement.setAttribute('data-markjs', info.count);
    },
    'done' : (totalMarks, totalMatches) => {
        marks = $('mark');
        matchCount = totalMatches;
    }
});

prevButton.click(function() {
    if (--currentIndex <= 0) currentIndex = 0;
    highlightMatchGroups();
});

nextButton.click(function() {
    if (++currentIndex > matchCount) currentIndex = matchCount;
    highlightMatchGroups();
});

function highlightMatchGroups() {
    marks.removeClass('current');
    const elems = marks.filter((i, elem) => $(elem).data('markjs') === currentIndex).addClass('current');
    elems.find('*[data-markjs]').addClass('current'); // add class to all descendant too
}