How can I help make Grammar authoring less terse / unhelpful?

Hello!

I took a long break from CodeMirror and Lezer due to feeling like this was a somewhat unforgiving environment for learning.

Anywho, hope things are better now! and I want to help improve the tooling!

I have this error:

Error: Could not load <repo>/packages/codemirror/glimmer/src/syntax.grammar (imported by src/index.js): Overlapping tokens Text and identifier used in same context (example: "$")
After:  (<repo>/packages/codemirror/glimmer/src/syntax.grammar 1:1)}

There is a lot to not understand with this message:

  • how?
  • where? (line numbers, etc?)
  • how do I resolve this?
  • are there common suggestions that should be … suggested? (using precedence, etc)

how do I make this error message provide something helpful and actionable?
I haven’t yet looked in to Lezer’s code, as I first need to understand the problem… which I don’t!

I see that it tried, but gave a line number of 1 – not great :sweat_smile:

Others:

Inconsistent skip sets after … Expression space+ identifier/"as" space+ "|" identifier

Needs to explain why. What in the skip blocks is inconsistent? What sub rules in the blocks? etc

What is happening is that you have ambiguous token types (Text and identifier both match "$"), and that is only okay if those tokens aren’t every allowed in the same position, but they are, right at the start of the parse. After: [nothing] normally shows a list of tokens that get you into the problematic state, but I guess you could adjust the error message to show something more clear (“At start of parse”, say) when there is no list of tokens to show.

What in the skip blocks is inconsistent?

You can only have one set of skip rules in any given parse state. So any overlap is inconsistent.

Can you expand on this?

I was looking through the tests in lezer/generator and found a test with 2 skip rules:

  it("can cache skipped content", () => {
    let comments = buildParser(`
@top T { "x"+ }
@skip { space | Comment }
@skip {} {
  Comment { commentStart (Comment | commentContent)* commentEnd }
}
@tokens {
  space { " "+ }
  commentStart { "(" }
  commentEnd { ")" }
  commentContent { ![()]+ }
}`)
    let doc = "x  (one (two) (three " + "(y)".repeat(500) + ")) x"
    let ast = comments.configure({bufferLength: 10, strict: true}).parse(doc)
    let ast2 = comments.configure({bufferLength: 10}).parse(doc.slice(1), fragments(ast, [0, 1, 0, 0]))
    ist(shared(ast, ast2), 80, ">")
  })

Now, the way the skip blocks are written are different.
One is saying (I think): always skip space and Comment when parsing, the other, is saying, Skip nothing when parsing a Comment.

What defines a “parse state”? is that internal?

Yes, but they aren’t used in the same position. A parse state is meant in the meaning that LR parsing gives to it. In the example, the empty skip set is used within the Comment rule, which is unambiguously delimited by tokens (commentStart/commentEnd) at both sides. Only the parse positions within the rule (after comment start, inside comment content repetition) will use the empty skip set, so it doesn’t conflict with the global skip set that is used in the rest of the grammar.