Here is an example of a simple grammar:
@precedence { cmp @left, not }
@top Program { Expr }
Expr { "null" | UnaryNot | NotNull }
NotNull { Expr !cmp "not" "null" }
UnaryNot { !not "not" Expr }
Its corresponding parser can parse expressions like null
, not null
, null not null
, and their derivatives. Let’s now extend the Program
rule to allow pairs of expressions:
@top Program { Expr | Pair }
Pair { Expr Expr }
In this toy grammar, there is no ambiguity because there’s no names, parenthesized expressions, or function calls (which make impossible to tell whether foo (bar)
is one expression or two). Nevertheless, let’s pretend we have to deal with such a problem here. In order to mitigate it, let’s restrict what the first expression in a pair can be. Let’s make it to be null
:
Pair { "null" Expr }
This works just fine! But what if this null
lived inside some ExprSubset
rule:
Pair { ExprSubset { "null" } Expr }
Uh-oh!
reduce/reduce conflict between
Expr -> "null"
and
ExprSubset -> "null"
With input:
"null" · "not" …
OK, we can probably solve this by annotating null
with the @cut
precendence:
@precedence { cmp @left, not, help @cut }
Pair { ExprSubset { !help "null" } Expr }
This works! But what if we had something more complex than null
:
Pair { ExprSubset { !help UnaryNot } Expr }
Unfortunately, this doesn’t work anymore:
reduce/reduce conflict between
Expr -> UnaryNot
and
ExprSubset -> UnaryNot
With input:
UnaryNot · "not" …
What’s surprising is that it’s solely ExprSubset
's fault. When there’s no rule nesting, everything just works again:
Pair { ("null" | UnaryNot) Expr }
Does this have something to do with what state rules carry around for backtracking? And once a rule is split into multiple rules, some information can’t cross their boundary anymore or something like that?
I certainly could make less modular rules by consolidating terms but is there any trick that could solve this without hurting readability?