How to debug "Overlapping character range"?

savq · July 6, 2023, 7:06am

Hello,

I’ve been working on refactoring some parts of lezer-julia, which defines a lot of unicode operators. The current version uses a separate string for each operator, and I want to replace this with character classes/sets (at least for single character operators).

However, if I try to define multiplicative operators this way:

// Minimal example
@tokens {
  times-operator {
    // ASCII operators:
    $[*/%&\\] |
    // unicode operators:
    $[⌿÷··⋅∘×∩∧⊗⊘⊙⊚⊛⊠⊡⊓∗∙∤⅋≀⊼⋄⋆⋇⋉⋊⋋⋌⋏⋒⟑⦸⦼⦾⦿⧶⧷⨇⨰⨱⨲⨳⨴⨵⨶⨷⨸⨻⨼⨽⩀⩃⩄⩋⩍⩎⩑⩓⩕⩘⩚⩜⩞⩟⩠⫛⊍▷⨝⟕⟖⟗⨟]
  }
}

I get the following error:

Overlapping character range (src/julia.grammar 7:4)

Defining a RegExp with the same character class in javascript seems to work fine, so I don’t think this is a problem with javascript strings.
None of the other Julia operators have any issues and I was able to write them as character classes. Only the times operators above raises an error.

Basically, I haven’t been able to figure out what the error actually is. What is overlapping what? Why is that a problem?

How should I go about fixing this issue? Should I write a program to test each character individually?

marijn · July 6, 2023, 8:00am

The · character (Unicode 183) occurs twice (directly next to each other) in your set.

savq · July 6, 2023, 5:14pm

Ok, that was indeed the problem. For future reference, the dots are supposed to be different, but at some point (heh) they got normalized to the same character. The proper characters should be:

'·': Unicode U+00B7 (category Po: Punctuation, other) MIDDLE DOT
'·': Unicode U+0387 (category Po: Punctuation, other) GREEK ANO TELEIA

Thanks for the help!

ivov · April 8, 2024, 12:31pm

We have this grammar that triggers the overlapping character range error:

@top Program { entity* }

entity { Plaintext | Resolvable }

@tokens {
  Plaintext { ![{] Plaintext? | "{" (@eof | ![{] Plaintext?) }

  OpenMarker[closedBy="CloseMarker"] { "{{" }

  CloseMarker[openedBy="OpenMarker"] { "}}" }

  Resolvable {
    OpenMarker resolvableChar* CloseMarker
  }

  resolvableChar { unicodeChar | "}" ![}] | "\\}}" }

  unicodeChar { $[\u0000-\u007C] | $[\u007E-\u1FFF] | $[\u20A0-\u20CF] | $[\u1F300-\u1F64F] }
}

@detectDelim

Specifically, the cause seems to be the last Unicode block. Even that single block by itself triggers the error:

unicodeChar { $[\u1F300-\u1F64F] }

Stack trace:

src/index.ts → dist/index.cjs, ./dist...
[!] (plugin rollup-plugin-lezer) Error: Could not load /Users/ivov/Development/codemirror-lang-n8n/src/syntax.grammar (imported by src/index.ts): Overlapping character range (/Users/ivov/Development/codemirror-lang-n8n/src/syntax.grammar 18:16)
Error: Could not load /Users/ivov/Development/codemirror-lang-n8n/src/syntax.grammar (imported by src/index.ts): Overlapping character range (/Users/ivov/Development/codemirror-lang-n8n/src/syntax.grammar 18:16)
    at Input.raise (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:909:15)
    at addRange (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1247:15)
    at parseExprInner (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1177:17)
    at parseExprSuffix (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1252:16)
    at parseExprSequence (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1282:20)
    at parseExprChoice (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1290:37)
    at parseBracedExpr (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1139:16)
    at parseRule (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1099:16)
    at parseTokens (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1343:29)
    at parseGrammar (/Users/ivov/Development/codemirror-lang-n8n/node_modules/@lezer/generator/dist/index.cjs:1014:26)

@marijn Any tips on how to debug this? Thank you as always.

marijn · April 8, 2024, 12:49pm

Just like in JavaScript, plain \u only takes four hexadecimal digits. So what you’re writing here is \u1f30, 0-\u1f64, F, which is indeed overlapping. Wrap the hex numbers in braces $[\u{1F300}-\u{1F64F}].

ivov · April 8, 2024, 1:16pm

Of course! Very thankful for your help, as always.