Rule nesting and reduce/reduce conflicts

mikhail · June 22, 2022, 5:06am

Here is an example of a simple grammar:

@precedence { cmp @left, not }

@top Program { Expr }

Expr { "null" | UnaryNot | NotNull }

NotNull { Expr !cmp "not" "null" }

UnaryNot { !not "not" Expr }

Its corresponding parser can parse expressions like null, not null, null not null, and their derivatives. Let’s now extend the Program rule to allow pairs of expressions:

@top Program { Expr | Pair }

Pair { Expr Expr }

In this toy grammar, there is no ambiguity because there’s no names, parenthesized expressions, or function calls (which make impossible to tell whether foo (bar) is one expression or two). Nevertheless, let’s pretend we have to deal with such a problem here. In order to mitigate it, let’s restrict what the first expression in a pair can be. Let’s make it to be null:

Pair { "null" Expr }

This works just fine! But what if this null lived inside some ExprSubset rule:

Pair { ExprSubset { "null" } Expr }

Uh-oh!

reduce/reduce conflict between
  Expr -> "null"
and
  ExprSubset -> "null"
With input:
  "null" · "not" …

OK, we can probably solve this by annotating null with the @cut precendence:

@precedence { cmp @left, not, help @cut }

Pair { ExprSubset { !help "null" } Expr }

This works! But what if we had something more complex than null:

Pair { ExprSubset { !help UnaryNot } Expr }

Unfortunately, this doesn’t work anymore:

reduce/reduce conflict between
  Expr -> UnaryNot
and
  ExprSubset -> UnaryNot
With input:
  UnaryNot · "not" …

What’s surprising is that it’s solely ExprSubset's fault. When there’s no rule nesting, everything just works again:

Pair { ("null" | UnaryNot) Expr }

Does this have something to do with what state rules carry around for backtracking? And once a rule is split into multiple rules, some information can’t cross their boundary anymore or something like that?

I certainly could make less modular rules by consolidating terms but is there any trick that could solve this without hurting readability?

marijn · June 22, 2022, 8:28am

By adding another wrapping nonterminal, you are requiring the parser to make the decision of whether to reduce an ExprSubset earlier, at a point where it can’t unambiguously do so (because it doesn’t know whether it’s going to parse an Expr or a Pair). This seems expected behavior for an LR parser.

mikhail · June 22, 2022, 10:01am

I see. I somehow expected precedence/ambiguity markers to work with non-terminals as well, but it starts to make sense why they don’t.