How do I resolve shift/reduce conflict?

NullVoxPopuli · July 23, 2022, 3:23pm

I have this error:

> lezer-generator src/glimmer.grammar -o src/parser && rollup -c

shift/reduce conflict between
  Invocation -> SubExpStart SubExpression SubExpEnd · invisibles
and
  Invocation -> SubExpStart SubExpression SubExpEnd
With input:
  StartStache SubExpStart SubExpression SubExpEnd · invisibles …
Shared origin: Value -> · Invocation

And I’ve seen these docs: Lezer System Guide
which I think are relevant here, so I tried applying @left in @precedence for the mentioned tokens/nodes here:

@precedence {
  Invocation @left,
  Value @left
}

but this doesn’t change anything.

For context, here is what Invocation and Value are:

Invocation { invisibles? SubExpStart SubExpression SubExpEnd invisibles? }
Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | Property
  | PropertyPath
}

// related to the error:
SubExpression {
  list<Value>
  NamedArgs?
}

And the full thing here for more context / seeing what all the other tokens/nodes are

@detectDelim
@top Glimmer { ( Expression | BlockComment )* }

String { string }

ShortComment { StartShortComment Text* EndStache }
LongComment { StartLongComment Text* EndLongComment }
BlockComment { LongComment | ShortComment }


Expression {
  StartStache SubExpression EndStache
  | Block
}

Block {
  StartBlock
  Expression*
  EndBlock
}
As { kw<"as"> "|" list<name> "|" }

StartBlock[closedBy=EndBlock] {
  StartOpenBlockStache Function invisibles SubExpression? As? EndStache
}
EndBlock[openedBy=StartBlock] {
  StartCloseBlockStache Function EndStache
}

Function { if | let | each | else | name }

SubExpression {
  list<Value>
  NamedArgs?
}


Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | Property
  | PropertyPath
}

Invocation { invisibles? SubExpStart SubExpression SubExpEnd invisibles? }

Pair { string "=" Value }
NamedArgs { list<Pair> }

@precedence {
  Invocation @left,
  Value @left
}

Argument { "@" identifier }
Property { this | Argument | identifier }
PropertyPath { Property ("." identifier)+ }

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

let { kw<"let"> }
each { kw<"each"> }
if { kw<"if"> }
else { kw<"else"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }


Number { number }



@tokens {
  invisibles { @whitespace+ }
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }


  Text { ![{] Text? | "{" (@eof | ![{] Text?) }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  StartOpenBlockStache[closedBy="EndStache"] { "{{#" }
  StartCloseBlockStache[closedBy="EndStache"] { "{{/" }
  StartStache[closedBy="EndStache"] { "{{" }
  EndStache[openedBy="StartStache | StartShortComment | StartOpenBlockStache | StartCloseBlockStache"] { "}}" }

  StartShortComment[closedBy="EndStache"] { "{{!" }
  StartLongComment[closedBy="EndLongComment"] { "{{!--" }
  EndLongComment[openedBy="StartLongComment"] { "--}}" }

  SubExpStart[closedBy="SubExpEnd"] { "(" }
  SubExpEnd[openedBy="SubExpStart"] { ")" }


  @precedence {
    identifier,
    name,
    ".", "=",
    "|", "@",
    SubExpStart, SubExpEnd,
    StartLongComment, EndLongComment,
    StartShortComment, StartOpenBlockStache, StartCloseBlockStache,
    StartStache, EndStache,
    invisibles,
    string,
    number,
    Text
  }
}


// Helper And Special Things
list<item> { item (invisibles item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }


@external propSource glimmerHighlighting from './highlight'

marijn · July 23, 2022, 3:41pm

In some place in the grammar, it is possible for invisibles to follow an Invocation, so in the parse state at the end of an Invocation (but before the optional invisibles), the parser doesn’t know whether to reduce the Invocation or shift the invisibles to be part of it.

Replacing invisibles? with something like (@Invocation invisibles)? might suppress it, but it seems like a cleaner solution would involve a more structured approach to where you match whitespace.

NullVoxPopuli · July 23, 2022, 4:28pm

the parser doesn’t know whether to reduce the Invocation or shift the invisibles to be part of it.

Is there a way to do the following?:

Invocation { SubExpStart SubExpression SubExpEnd }

yet still allow the invocation to be

 ( someFunction ) 
│││└ Property  │││
││└ invisibles ││└ invisibles
│└ SubExpStart │└ SubExpEnd
└ invisibles   └ invisibles

the invisibles aren’t significant in this case, in that there could be any number of them – it’d be weird for someone to use more than one space character, but I think it’s still allowed. The invocation itself should only be identified as (someFunction) in this case

I’ve toyed with @skip, but skipping all invisibles ( defined as invisibles { @whitespace+ } ), prevents needing any invisible at all for lists and similar ( defined as list<item> { item (invisibles item)* }, rather than common separated, like in JS )

marijn · July 23, 2022, 4:46pm

Does your language really requires a single space between list items? Because that’s a rather unusual syntax (though possible, of course).

NullVoxPopuli · July 23, 2022, 5:08pm

that’s a good question. I wasn’t sure, so I tested this in the very project I’m trying to switch from monaco to codemirror.

and it looks like lists can have any amounts of invisible characters between items.

  {{     (
    listSpacingTest   1 9    'hello'      
    'possible, but why?'   )}}

    <br />
    and then the normal way folks should do this:
    <br />
  {{ (listSpacingTest 1 9 'hello') }}

marijn · July 24, 2022, 8:41am

Given that, is there anything preventing you from using @skip inside the {{ }} delimited content?

NullVoxPopuli · July 24, 2022, 12:00pm

I haven’t quite figured out how.

I’ve been over the docs a few times as well: Lezer System Guide
but I don’t think it covers this situation? (or I’m really dense)

Every time I try to use @skip in the form of:

@skip { invisibles } {
  // something using invisibles
}
// or
@skip { invisibles }
@skip { } {
  // something using invisibles
}

I get Inconsistent skip sets errors

marijn · July 24, 2022, 5:34pm

Is there a reason not to use a global @skip rule?

NullVoxPopuli · July 24, 2022, 8:06pm

my grammar doesn’t compile – how do you mean?

If anyone wants to see what I have so far, It’s here: Implementation by NullVoxPopuli · Pull Request #2 · NullVoxPopuli/glimdown · GitHub

git clone git@github.com:NullVoxPopuli/glimdown.git
cd glimdown
git checkout implementation
cd packages/lezer/glimmer
pnpm install
pnpm build

At the time of posting this comment, this is my grammar, and I don’t know what I’m doing wrong (note that this is different from the original post)

@detectDelim
@top Glimmer { ( Expression | BlockComment )* }
@skip { invisibles }

String { string }

ShortComment { StartShortComment Text* EndStache }
LongComment { StartLongComment Text* EndLongComment }
BlockComment { LongComment | ShortComment }


Expression {
  StartStache SubExpression EndStache
  | Block
}

Block {
  StartBlock
  Expression*
  EndBlock
}
As { kw<"as"> "|" list<name> "|" }

StartBlock[closedBy=EndBlock] {
  StartOpenBlockStache Function SubExpression? As? EndStache
}
EndBlock[openedBy=StartBlock] {
  StartCloseBlockStache Function EndStache
}

Function { if | let | each | else | name }

SubExpression {
  list<Value>
  NamedArgs?
}


Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | Property
  | PropertyPath
}

Invocation { SubExpStart SubExpression SubExpEnd }

Pair { string "=" Value }
NamedArgs { list<Pair> }

@precedence {
  invisibles @left,
  Invocation @left,
  Value @left
}

Argument { "@" identifier }
Property { this | Argument | identifier }
PropertyPath { Property ("." identifier)+ }

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

let { kw<"let"> }
each { kw<"each"> }
if { kw<"if"> }
else { kw<"else"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }


Number { number }



@tokens {
  invisibles { @whitespace }
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }


  Text { ![{] Text? | "{" (@eof | ![{] Text?) }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  StartOpenBlockStache[closedBy="EndStache"] { "{{#" }
  StartCloseBlockStache[closedBy="EndStache"] { "{{/" }
  StartStache[closedBy="EndStache"] { "{{" }
  EndStache[openedBy="StartStache | StartShortComment | StartOpenBlockStache | StartCloseBlockStache"] { "}}" }

  StartShortComment[closedBy="EndStache"] { "{{!" }
  StartLongComment[closedBy="EndLongComment"] { "{{!--" }
  EndLongComment[openedBy="StartLongComment"] { "--}}" }

  SubExpStart[closedBy="SubExpEnd"] { "(" }
  SubExpEnd[openedBy="SubExpStart"] { ")" }


  @precedence {
    identifier,
    name,
    ".", "=",
    "|", "@",
    SubExpStart, SubExpEnd,
    StartLongComment, EndLongComment,
    StartShortComment, StartOpenBlockStache, StartCloseBlockStache,
    StartStache, EndStache,
    invisibles,
    string,
    number,
    Text
  }
}


// Helper And Special Things
list<item> { item (invisibles+ item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }

@external propSource glimmerHighlighting from './highlight'

which, when running pnpm build, I get

Use of token invisibles conflicts with skip rule (src/glimmer.grammar 1:1)

but if I remove all invisibles from my grammar (except for the skip rule)
(which, at the time of writing, is just in the list definition, list<item> { item (invisibles+ item)* } → list<item> { item (item)* } (which I don’t think would be a valid grammar entry?) but anyway, this gives:

shift/reduce conflict between
  String -> · string
and
  list<Value> -> Value
With input:
  StartStache Value · string …
Shared origin: SubExpression -> · list<Value> NamedArgs
  via list<Value> -> Value · Value+
    via Value+ -> · Value
      via Value -> · String
        String -> · string

another shift/reduce confilct.

but then I could delete my String { string } rule (feels kind of redundant anyway?)

and then pnpm build gives:

shift/reduce conflict between
  Value -> · String
and
  list<Value> -> Value
With input:
  StartStache Value · String …
Shared origin: SubExpression -> · list<Value> NamedArgs
  via list<Value> -> Value · Value+
    via Value+ -> · Value
      Value -> · String

and I haven’t been able to figure out a precedence definition that solves this. I think because there is no delimeter in list anymore.

Any guidance would be super appreciated <3

NullVoxPopuli · July 27, 2022, 9:08pm

are lisp-style languages not possible in Lezer? or am I missing something really obvious?

marijn · July 27, 2022, 11:09pm

Lisps should definitely be possible (see also Clojure). I don’t know how list is defined so I have no idea what that conflict is about.

NullVoxPopuli · July 27, 2022, 11:40pm

atm, list is defined as:

list<item> { item (item)* }

because invisibles (aka @whitespace+) have been removed, and are @skiped entirely now (which looks similar to what closure is doing

marijn · July 28, 2022, 8:50am

The error is telling you that when, in SubExpression, after parsing a Value and seeing a string ahead, the parser doesn’t know whether to finish its list<Value> and start on the list of pairs, or include that string as another value in its list of values. You could loosen the grammar a bit by making SubExpression match list<Value | Pair>, or try to factor the rules so that no incompatible decisions have to be made before the “=” token.

NullVoxPopuli · July 28, 2022, 1:45pm

is there anyway to have it look ahead to see if it would match the = token?

NullVoxPopuli · July 28, 2022, 2:13pm

here is an isolated example I just ran in to:

Argument { "@" identifier }
PropertyPath { (this | Argument | identifier) ("." identifier)+ }

where identifier is:

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }

I get shift/reduce error:

shift/reduce conflict between
  PropertyPath -> · identifier ("." identifier)+
and
  list<value> -> value
With input:
  StartStache value · identifier …
Shared origin: SubExpression -> · list<value> NamedArgs
  via list<value> -> value · value+
    via value+ -> · value
      via value -> · PropertyPath
        PropertyPath -> · identifier ("." identifier)+

How do I tell the parser that all identifier.identifier.identifier (recursively?) is a PropertyPath?

marijn · July 28, 2022, 4:07pm

The thing about LR grammars is that you can’t really isolate bits like that — these rules are probably being used in a context where they may be followed by a “.” token, and the conflict arises because it is unclear whether to continue parsing a PropertyPath or to use that outer rule.

I would suggest you spend some time reading up on LR parsers and how they work. It isn’t realistic for me to help you through every problem you run into here.

NullVoxPopuli · July 28, 2022, 7:59pm

(post deleted by author)

GamerGirlandCo · July 28, 2022, 9:34pm

i’m no expert, but i think what you might need is a parsing algorithm like the markdown one. as you can see, it’s not defined with a .grammar file, because it’s just… not possible due to its complexity (i.e., the need to look ahead like you described).
while browsing this forum earlier, i found a thread confirming that markup languages are either super difficult or downright impossible with lezer grammar.
though from what i can gather, you’re not making a markup language (lucky you :P), i think it’s worth a shot. the more hoops you have to jump through with .grammar files, the more you have to wonder if it’s the right way to go about things.
or… maybe all you need is an external tokenizer / specializer. it worked for me for a bit ~~until i ended up needing to create a makeshift state machine that was simply an array with elements unshift()ed into it to see which token was most recently accepted hah.~~
sorry that i don’t have any definitive answers, i’m still trying to comprehend it myself.

NullVoxPopuli · July 28, 2022, 10:06pm

uh oh. I am tho. I will have to highlight stuff like Glimmer w/ Highlight.js

Glimmer is an extension on top of HTML

Some tree-sitter examples:

GitHub - alexlafroscia/tree-sitter-glimmer: A TreeSitter grammar for Glimmer (HBS) templates ← the same grammar I’m trying to implement now.
https://github.com/tree-sitter/tree-sitter-javascript/pull/208
- embeds glimmer and javascript

yeah, I just know that tree-sitter was a fair bit easier to do all this in , and I could use their grammar format for everything. no need to write custom JS for anything.

all good! we’ll figure this system out!

highlight.js was also kinda difficult, but had great support.
tree-sitter had no support, but had plenty of examples and good error messages that helped out.

After I finish Glimmer in Lezer, I have to figure out:

JS + Glimmer
TS + Glimmer
Markdown + Glimmer

(for which I’m hoping the overlay / multi-language features work out pretty well for)

marijn · August 1, 2022, 1:59pm

That looks a lot more structured (and LR-parseable) than the kind of stuff people tend to mean by ‘markup language’ (Markdown, ASCIIdoc, and similar monstrosities).

Most tree-sitter grammars I’ve looked at had custom tokenizers written in C, but okay. Also, it uses the exact same parsing technique (LR with opt-in GLR), with the same type of tricky shift/reduce conflict errors, so I’m not sure how it was so much easier for you. In any case, if you have a working tree-sitter grammar, translating it to Lezer should be really easy.