Can't seem to fix shift/reduce conflict with ambiguity markers

jedwards1211 · June 4, 2024, 6:18am

I’m trying to make a human relative date time parser. Running into the following issue in this heavily simplified example:

@top DateTimeExpression {
  Date (space ~s ('at' space ~s)? Time)?
  | Time
}

Date { FullYear ~n (space ~s Month (space ~s DayOfMonth)?)? }

Time { Hours ~n (space? AmPm)?  }

FullYear { Digit ~n Digit ~n Digit ~n Digit ~n }

DayOfMonth { '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12' | '13' | '14' | '15' | '16' | '17' | '18' | '19' | '20' | '21' | '22' | '23' | '24' | '25' | '26' | '27' | '28' | '29' | '30' | '31' ~n }
Month { 'january' | 'february' | 'march' | 'april' | 'may' | 'june' | 'july' | 'august' | 'september' | 'october' | 'november' | 'december' }
Digit { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ~n }
Hours { ('0' | '1')? ~n Digit | '2' ~n ('0' | '1' | '3' | '4') ~n }
AmPm { ('a' 'm'? | 'p' 'm'?) }

@tokens {
  space { @whitespace+ }
}

shift/reduce conflict between
  Date -> FullYear space Month · space DayOfMonth
and
  Date -> FullYear space Month
With input:
  FullYear space Month · space …
Shared origin: @top -> · Date

The space after the FullYear or Month could by followed by either a DayOfMonth or 'at'…I thought annotating the spaces with ~s would tell the parser to try both branches in GLR, but seemingly not.

Btw, is (Foo ~marker)? equivalent to Foo? ~marker or not?

Notes: I’ve avoided tokenizing numbers so that I have some hope of interpreting 32/08/14 as yy/MM/dd and 14/08/32 as dd/MM/yy, though I guess I could just leave that for code that runs after the parser.
Also I’ve had to rely on significant whitespace for many of my rules.

Starting to wonder if I should just hand write a parser with infinite lookahead for this use case, since the expressions are never going to be very long… but Lezer is awesome! I’m hoping I can use it!

marijn · June 4, 2024, 7:53am

It looks like that conflict is unresolved because there’s no ~ marker before (space ~s DayOfMonth).

No. a Foo? will expand to a | a Foo, and the variant without Foo will only have the GRM marker when it is outside of the parentheses.

jedwards1211 · June 4, 2024, 4:36pm

Okay, yes, that did it, thanks!
I had mistakenly assumed there’s no ambiguity between 'at' and Month; but I guess since they’re not tokenized and both can start with the letter a, there is ambiguity?

jedwards1211 · June 4, 2024, 4:46pm

Okay I realized I misunderstood how the ambiguity markers work. I thought an ambiguity marker annotates whatever comes before it, but it was only on a close reading of the docs that I realized they are standalone things that belong “at the point where reductions happen”. Hopefully after working with this some more I’ll have some ideas of how to add more examples to the docs that will help other people understand how they work more quickly.

jedwards1211 · June 4, 2024, 5:02pm

I really love the design of the grammar and parse tree api, it has been really flexible and powerful!