With the following grammar, a Set should be parsed when we find #{ ...expressions... } .
@top[name=Program] { expression* }
expression { Symbol | Set | Map }
Map { "{" expression* "}" }
@skip {} {
Set { "#" Map }
}
@tokens {
whitespace { (std.whitespace | ",")+ }
Symbol { std.asciiLetter+ }
}
When passed the source # (and strict mode off), the # is parsed as a Program(Set) (some kind of partial parse, from error recovery).
Is it possible to detect that this Set is actually not a Set as defined, without inspecting its children? I’m implementing some editing commands and in states like this I would want to treat the # as something else, like Symbol or a specific InvalidTerminalToken. (For my commands to work a Set at least needs to contain its brackets.)
Are you sure there’s no error node in the Set node?
It may be there, and I’m just not sure how/where to look. I do see that the firstChild of the Set has the name ⚠and its type.props is {"0": true, propData: [{name: "error", from: null}, ""]}. The Set itself has empty type.props.
I’m guessing there is some way of reading NodeProps that I am missing - from the docs I was expecting to find an error key somewhere: Lezer Reference Manual (I admittedly have not fully grokked how NodeProps work).
The props are static, per node type, so you’ll never have a Set with the error prop set. Rather, there’s an error type that is inserted whenever error recovery is done.
No, the only thing the parser stores it that it went off the grammar at that point somehow. Storing more would get expensive, and LR parsers aren’t very good at producing meaningful syntax errors in any case.
I’m in no way an expert at the matter, so I wonder how to build on that.
As fas as best practices are concerned, how would one proceed to create a diagnostic for the editor? For example, from start/end pair one could extract the “broken” piece of code and:
send it to the editor saying “That’s an error”
with a more in-depth analysis, one could suppose what was the intent of the developer e.g. if the there is a trailing quote " and a missing trailing one, diagnostic can be sent to the editor saying “Unterminated string literal”.
one could simply list the expected tokens (e.g. if we are in a function call and we find a keyword instead of a parameter, we can say “Expected variable o literal, instead found xyz”).
This would be useful but means implementing lots of possibile combinations. Is this the way to go? Is there some support that Lezer/Codemirror give to create such diagnostics or it’s up to the developer?
I don’t think Lezer is the proper parser for that kind of feedback—trying to reconstruct what the mistake was from a Lezer tree seems like it would be about as much work as writing your own parser. If a good linter exists for your language, that may be able to help.