Help with shift/reduce conflict

odc19 · August 5, 2021, 10:37pm

I’m trying to parse a fragment of haskell code and running into a shift/reduce conflict that I don’t know what to do with. I have attached a minimal grammar that reproduces the conflict. It has to do with constructors and constraints in datatype definitions.

The error is:

shift/reduce conflict between
  varid+ -> · varid
and
  qtycls -> conid
With input:
  varid/"data" conid · varid …
Shared origin: @top -> · Topdecl
  via Topdecl -> varid/"data" · Simpletype
    via Simpletype -> conid · varid+
      varid+ -> · varid
  via Topdecl -> varid/"data" · Constraint "=>" Simpletype
    via Constraint -> · qtycls varid
      qtycls -> conid ·

My minimal grammar is:

// data Foo a should parse as Topdecl(Simpletype)
// data Foo a => Bar a should parse as Topdecl(Constraint, Simpletype)

@precedence {
    modid
}

@skip {
    whitechar 
}

modid {
    !modid (conid ".")* conid
}

qtycls {
    (modid ".")? conid
}

@top Topdecls {
    Topdecl (";" Topdecl)*
}

Topdecl {
    @specialize<varid, "data">  (Constraint "=>")? Simpletype 
}

Constraint {
    qtycls varid
}

Simpletype {
    conid (varid)+
}

@tokens {
    whitechar {std.whitespace}
    small { std.asciiLowercase | "_"}
    large { std.asciiUppercase }
    varid { small (small | large | std.digit | "\'")* }
    conid { large (small | large | std.digit | "\'")* }
    }

Can anyone help?

cdsmith · August 5, 2021, 11:13pm

I’m Olivian’s mentor this summer for Google Summer of Code (Haskell org) and working with him on this problem.

It’s worth saying that this is definitely an understandable conflict, since when just seeing “data Foo a …”, it’s unknown whether “Foo a” should be interpreted as a constraint or as the type being declared. (Note: “data Foo a => Bar a” parses as a declaration of the type “Bar a”, where “Foo a” is a constraint on the type; while “data Foo a” by itself is a declaration of the type “Foo a”.) So it would be reasonable if this needs some kind of ambiguity similar to the comment at Lezer System Guide - Allowing Ambiguity However, Olivian and I are both still unfamiliar with Lezer, and it’s not clear to us what such a solution would look like and where the ~ambig syntax should appear in this case.

marijn · August 6, 2021, 8:20am

Firstly, since modid and qtycls match the exact same thing, it doesn’t seem useful to define them like that. Just using modid instead of qtycls seems to solve this problem.

cdsmith · August 9, 2021, 9:44pm

Thanks Marijn.

I don’t think this answer will work, and I think it’s just a matter of the minimal grammar not being complex enough so that a random fiddling around fixes the error. In the end, we would like the grammar to distinguish between modules (that is, modid) and qualified type classes (that is, qtycls). I definitely could see them being highlighted differently, for example.

Olivian’s full grammar is much longer, and can be found at haskell_code_mirror/haskell.grammar at master · odc19/haskell_code_mirror · GitHub, but it’s a bit of a beast, so I wouldn’t expect you to read through the whole thing. The same conflict appears in the full grammar.

I strongly suspect that the right approach to resolving all four remaining shift-reduce conflicts in Olivian’s grammar will be to allow ambiguity. Two of them relate to the example Olivian gave with contexts, and the other two relate to pattern guards. These situations are almost identical to the example at Lezer System Guide where the syntactic role of an element becomes clear only after parsing an unbounded amount of the rest of the source file; namely, upon encountering the => or <- tokens respectively. I just don’t understand Lezer enough to know how these ambiguities should be annotated in the grammar.

marijn · August 10, 2021, 5:56am

This may be similar to the Rust grammar, which also has a bunch of complicated identifier formats that can play different roles (path, typePath, patternPath). Search for ~path in that file for examples. Basically, you pick a name for this conflict (path in the Rust grammar), and put ~name markers at all the places where the conflicts occur to allow the parser to run GLR there.