handling error states

With the following grammar, a Set should be parsed when we find #{ ...expressions... } .

@top[name=Program] { expression* }

expression { Symbol | Set | Map }
Map { "{" expression* "}" }

@skip {} {
  Set { "#" Map }
}

@tokens {
  whitespace { (std.whitespace | ",")+ }
  Symbol { std.asciiLetter+ }
}

When passed the source # (and strict mode off), the # is parsed as a Program(Set) (some kind of partial parse, from error recovery).

Is it possible to detect that this Set is actually not a Set as defined, without inspecting its children? I’m implementing some editing commands and in states like this I would want to treat the # as something else, like Symbol or a specific InvalidTerminalToken. (For my commands to work a Set at least needs to contain its brackets.)

Are you sure there’s no error node in the Set node? Because error recovery should always leave error markers.

Are you sure there’s no error node in the Set node?

It may be there, and I’m just not sure how/where to look. I do see that the firstChild of the Set has the name ⚠ and its type.props is {"0": true, propData: [{name: "error", from: null}, ""]}. The Set itself has empty type.props.

I’m guessing there is some way of reading NodeProps that I am missing - from the docs I was expecting to find an error key somewhere: Lezer Reference Manual (I admittedly have not fully grokked how NodeProps work).

I think the prop method is that way. Don’t look at the props object directly—it’s not documented, which means it’s private.

OK. So in this case we find the error not on the Set but on its missing child - <Set>.firstChild.type.prop(NodeProp.error).

The props are static, per node type, so you’ll never have a Set with the error prop set. Rather, there’s an error type that is inserted whenever error recovery is done.

I see, makes sense. So we do a bit of checking to see if a Set node contains an error type.

Neither I understand how to read errors from Lezer parsing, I guess one would:

		tree.iterate({
			enter(nodeType: NodeType, start: number, end: number) {
				if (nodeType.prop(NodeProp.error)) {
                     console.log(`- Props: ${JSON.stringify(nodeType)} ${start} ${end}`);
				}
			}
		})

In this way I get something like:

Props: {"name":"âš ","props":{"0":true},"id":0} S5 E6

Is there a way to get more information about the error that happened?

Thanks!

No, the only thing the parser stores it that it went off the grammar at that point somehow. Storing more would get expensive, and LR parsers aren’t very good at producing meaningful syntax errors in any case.

1 Like

Thanks for the (prompt) reply!!!

I’m in no way an expert at the matter, so I wonder how to build on that.

As fas as best practices are concerned, how would one proceed to create a diagnostic for the editor? For example, from start/end pair one could extract the “broken” piece of code and:

  1. send it to the editor saying “That’s an error”
  2. with a more in-depth analysis, one could suppose what was the intent of the developer e.g. if the there is a trailing quote " and a missing trailing one, diagnostic can be sent to the editor saying “Unterminated string literal”.
  3. one could simply list the expected tokens (e.g. if we are in a function call and we find a keyword instead of a parameter, we can say “Expected variable o literal, instead found xyz”).

This would be useful but means implementing lots of possibile combinations. Is this the way to go? Is there some support that Lezer/Codemirror give to create such diagnostics or it’s up to the developer?

Thanks!

I don’t think Lezer is the proper parser for that kind of feedback—trying to reconstruct what the mistake was from a Lezer tree seems like it would be about as much work as writing your own parser. If a good linter exists for your language, that may be able to help.

1 Like

Ok! So I guess the proper way to go is relying on tooling to produce diagnostics. Thank you again for your kind support!