Regarding skip sets

Hi! I’m creating a Lezer parser for a programming languages which has two kinds of expressions: indentation-sensitive expression (ISE) and indentation-insensitive expression (IIE). I implemented an external tokenizer which emits indent, dedent, keep, and blankLine terms and a context which tracks the indentation (similar to the indentation example).

The default skip set (for rules of statements, declarations, and IIE) contains whitespaces, indent, dedent, keep, and blankLine, because they don’t care about indentation. This part of my grammar works well.

For the sake of clarity, ISEs are only allowed as the function body or ParenthesizedExpression (the expression wrapped in parentheses, a variant of IIEs). Because ISEs care about indentation, I put rules of ISEs in a skip block which skips whitespaces only. However, ISEs may reference IIEs sometimes, and I got errors like: Inconsistent skip sets after identifier/"let" VariableName "=" identifier/"block" indent.

Therefore, I guess rules surrounded by a skip block cannot reference rules outside the skip block. Is my speculation correct?

I also want to know if there’s a recommended solution for grammars similar to my description.

They can, but only if the referenced rules are ‘closed’ (don’t end with any optional or repeated elements). Otherwise, it is unclear what to skip at a position where such an element may occur (in which case the inner skip set applies) or not (in which case the outer skip set would apply).

Ah right, that makes sense. Do you have any recommended solution for my use case?

To be specific, I have two groups of rules, each of which has different skip sets. Rules from one group might reference rules from the other group.

I use a mixed language since I can cleanly segregate the sub-grammar into a single token of the base grammar. Can child/sub tokens appear in the parse tree? - #3 by AlexErrant

Concretely, I’m building a search grammar similar to github’s search syntax. In addition to syntax like somesearchword stars:>100, somesearchword may be a glob or regex, which are (obviously) their own sub-grammars. As such I made a separate glob grammar. Your ISE and IIE might be better off as separate grammars.

Trying to integrate that glob grammar into my query grammar resulted in many “Inconsistent skip sets” and “shift/reduce conflict between” which I was unable to resolve. This might mean having to duplicate rules from one of your grammars into the other, but this is just an idea for you to explore.