How do I resolve shift/reduce conflict?

I have this error:

> lezer-generator src/glimmer.grammar -o src/parser && rollup -c

shift/reduce conflict between
  Invocation -> SubExpStart SubExpression SubExpEnd · invisibles
and
  Invocation -> SubExpStart SubExpression SubExpEnd
With input:
  StartStache SubExpStart SubExpression SubExpEnd · invisibles …
Shared origin: Value -> · Invocation

And I’ve seen these docs: Lezer System Guide
which I think are relevant here, so I tried applying @left in @precedence for the mentioned tokens/nodes here:

@precedence {
  Invocation @left,
  Value @left
}

but this doesn’t change anything.

For context, here is what Invocation and Value are:

Invocation { invisibles? SubExpStart SubExpression SubExpEnd invisibles? }
Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | Property
  | PropertyPath
}

// related to the error:
SubExpression {
  list<Value>
  NamedArgs?
}


And the full thing here for more context / seeing what all the other tokens/nodes are

@detectDelim
@top Glimmer { ( Expression | BlockComment )* }

String { string }

ShortComment { StartShortComment Text* EndStache }
LongComment { StartLongComment Text* EndLongComment }
BlockComment { LongComment | ShortComment }


Expression {
  StartStache SubExpression EndStache
  | Block
}

Block {
  StartBlock
  Expression*
  EndBlock
}
As { kw<"as"> "|" list<name> "|" }

StartBlock[closedBy=EndBlock] {
  StartOpenBlockStache Function invisibles SubExpression? As? EndStache
}
EndBlock[openedBy=StartBlock] {
  StartCloseBlockStache Function EndStache
}

Function { if | let | each | else | name }

SubExpression {
  list<Value>
  NamedArgs?
}


Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | Property
  | PropertyPath
}

Invocation { invisibles? SubExpStart SubExpression SubExpEnd invisibles? }

Pair { string "=" Value }
NamedArgs { list<Pair> }

@precedence {
  Invocation @left,
  Value @left
}

Argument { "@" identifier }
Property { this | Argument | identifier }
PropertyPath { Property ("." identifier)+ }

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

let { kw<"let"> }
each { kw<"each"> }
if { kw<"if"> }
else { kw<"else"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }


Number { number }



@tokens {
  invisibles { @whitespace+ }
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }


  Text { ![{] Text? | "{" (@eof | ![{] Text?) }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  StartOpenBlockStache[closedBy="EndStache"] { "{{#" }
  StartCloseBlockStache[closedBy="EndStache"] { "{{/" }
  StartStache[closedBy="EndStache"] { "{{" }
  EndStache[openedBy="StartStache | StartShortComment | StartOpenBlockStache | StartCloseBlockStache"] { "}}" }

  StartShortComment[closedBy="EndStache"] { "{{!" }
  StartLongComment[closedBy="EndLongComment"] { "{{!--" }
  EndLongComment[openedBy="StartLongComment"] { "--}}" }

  SubExpStart[closedBy="SubExpEnd"] { "(" }
  SubExpEnd[openedBy="SubExpStart"] { ")" }


  @precedence {
    identifier,
    name,
    ".", "=",
    "|", "@",
    SubExpStart, SubExpEnd,
    StartLongComment, EndLongComment,
    StartShortComment, StartOpenBlockStache, StartCloseBlockStache,
    StartStache, EndStache,
    invisibles,
    string,
    number,
    Text
  }
}


// Helper And Special Things
list<item> { item (invisibles item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }


@external propSource glimmerHighlighting from './highlight'
1 Like

In some place in the grammar, it is possible for invisibles to follow an Invocation, so in the parse state at the end of an Invocation (but before the optional invisibles), the parser doesn’t know whether to reduce the Invocation or shift the invisibles to be part of it.

Replacing invisibles? with something like (@Invocation invisibles)? might suppress it, but it seems like a cleaner solution would involve a more structured approach to where you match whitespace.

the parser doesn’t know whether to reduce the Invocation or shift the invisibles to be part of it.

Is there a way to do the following?:

Invocation { SubExpStart SubExpression SubExpEnd }

yet still allow the invocation to be

 ( someFunction ) 
│││└ Property  │││
││└ invisibles ││└ invisibles
│└ SubExpStart │└ SubExpEnd
└ invisibles   └ invisibles

the invisibles aren’t significant in this case, in that there could be any number of them – it’d be weird for someone to use more than one space character, but I think it’s still allowed. The invocation itself should only be identified as (someFunction) in this case

I’ve toyed with @skip, but skipping all invisibles ( defined as invisibles { @whitespace+ } ), prevents needing any invisible at all for lists and similar ( defined as list<item> { item (invisibles item)* }, rather than common separated, like in JS )

Does your language really requires a single space between list items? Because that’s a rather unusual syntax (though possible, of course).

that’s a good question. I wasn’t sure, so I tested this in the very project I’m trying to switch from monaco to codemirror.

and it looks like lists can have any amounts of invisible characters between items.

  {{     (
    listSpacingTest   1 9    'hello'      
    'possible, but why?'   )}}

    <br />
    and then the normal way folks should do this:
    <br />
  {{ (listSpacingTest 1 9 'hello') }}

Given that, is there anything preventing you from using @skip inside the {{ }} delimited content?

I haven’t quite figured out how.

I’ve been over the docs a few times as well: Lezer System Guide
but I don’t think it covers this situation? (or I’m really dense)

Every time I try to use @skip in the form of:

@skip { invisibles } {
  // something using invisibles
}
// or
@skip { invisibles }
@skip { } {
  // something using invisibles
}

I get Inconsistent skip sets errors

Is there a reason not to use a global @skip rule?

my grammar doesn’t compile – how do you mean?

If anyone wants to see what I have so far, It’s here: Implementation by NullVoxPopuli · Pull Request #2 · NullVoxPopuli/glimdown · GitHub

git clone git@github.com:NullVoxPopuli/glimdown.git
cd glimdown
git checkout implementation
cd packages/lezer/glimmer
pnpm install
pnpm build

At the time of posting this comment, this is my grammar, and I don’t know what I’m doing wrong :upside_down_face: (note that this is different from the original post)

@detectDelim
@top Glimmer { ( Expression | BlockComment )* }
@skip { invisibles }

String { string }

ShortComment { StartShortComment Text* EndStache }
LongComment { StartLongComment Text* EndLongComment }
BlockComment { LongComment | ShortComment }


Expression {
  StartStache SubExpression EndStache
  | Block
}

Block {
  StartBlock
  Expression*
  EndBlock
}
As { kw<"as"> "|" list<name> "|" }

StartBlock[closedBy=EndBlock] {
  StartOpenBlockStache Function SubExpression? As? EndStache
}
EndBlock[openedBy=StartBlock] {
  StartCloseBlockStache Function EndStache
}

Function { if | let | each | else | name }

SubExpression {
  list<Value>
  NamedArgs?
}


Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | Property
  | PropertyPath
}

Invocation { SubExpStart SubExpression SubExpEnd }

Pair { string "=" Value }
NamedArgs { list<Pair> }

@precedence {
  invisibles @left,
  Invocation @left,
  Value @left
}

Argument { "@" identifier }
Property { this | Argument | identifier }
PropertyPath { Property ("." identifier)+ }

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

let { kw<"let"> }
each { kw<"each"> }
if { kw<"if"> }
else { kw<"else"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }


Number { number }



@tokens {
  invisibles { @whitespace }
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }


  Text { ![{] Text? | "{" (@eof | ![{] Text?) }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  StartOpenBlockStache[closedBy="EndStache"] { "{{#" }
  StartCloseBlockStache[closedBy="EndStache"] { "{{/" }
  StartStache[closedBy="EndStache"] { "{{" }
  EndStache[openedBy="StartStache | StartShortComment | StartOpenBlockStache | StartCloseBlockStache"] { "}}" }

  StartShortComment[closedBy="EndStache"] { "{{!" }
  StartLongComment[closedBy="EndLongComment"] { "{{!--" }
  EndLongComment[openedBy="StartLongComment"] { "--}}" }

  SubExpStart[closedBy="SubExpEnd"] { "(" }
  SubExpEnd[openedBy="SubExpStart"] { ")" }


  @precedence {
    identifier,
    name,
    ".", "=",
    "|", "@",
    SubExpStart, SubExpEnd,
    StartLongComment, EndLongComment,
    StartShortComment, StartOpenBlockStache, StartCloseBlockStache,
    StartStache, EndStache,
    invisibles,
    string,
    number,
    Text
  }
}


// Helper And Special Things
list<item> { item (invisibles+ item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }

@external propSource glimmerHighlighting from './highlight'

which, when running pnpm build, I get

Use of token invisibles conflicts with skip rule (src/glimmer.grammar 1:1)

but if I remove all invisibles from my grammar (except for the skip rule)
(which, at the time of writing, is just in the list definition, list<item> { item (invisibles+ item)* }list<item> { item (item)* } (which I don’t think would be a valid grammar entry?) but anyway, this gives:

shift/reduce conflict between
  String -> · string
and
  list<Value> -> Value
With input:
  StartStache Value · string …
Shared origin: SubExpression -> · list<Value> NamedArgs
  via list<Value> -> Value · Value+
    via Value+ -> · Value
      via Value -> · String
        String -> · string

another shift/reduce confilct.

but then I could delete my String { string } rule (feels kind of redundant anyway?)

and then pnpm build gives:

shift/reduce conflict between
  Value -> · String
and
  list<Value> -> Value
With input:
  StartStache Value · String …
Shared origin: SubExpression -> · list<Value> NamedArgs
  via list<Value> -> Value · Value+
    via Value+ -> · Value
      Value -> · String

and I haven’t been able to figure out a precedence definition that solves this. I think because there is no delimeter in list anymore.

Any guidance would be super appreciated <3

are lisp-style languages not possible in Lezer? or am I missing something really obvious?

Lisps should definitely be possible (see also Clojure). I don’t know how list is defined so I have no idea what that conflict is about.

atm, list is defined as:

list<item> { item (item)* }

because invisibles (aka @whitespace+) have been removed, and are @skiped entirely now (which looks similar to what closure is doing

The error is telling you that when, in SubExpression, after parsing a Value and seeing a string ahead, the parser doesn’t know whether to finish its list<Value> and start on the list of pairs, or include that string as another value in its list of values. You could loosen the grammar a bit by making SubExpression match list<Value | Pair>, or try to factor the rules so that no incompatible decisions have to be made before the “=” token.

is there anyway to have it look ahead to see if it would match the = token?

here is an isolated example I just ran in to:

Argument { "@" identifier }
PropertyPath { (this | Argument | identifier) ("." identifier)+ }

where identifier is:

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }

I get shift/reduce error:

shift/reduce conflict between
  PropertyPath -> · identifier ("." identifier)+
and
  list<value> -> value
With input:
  StartStache value · identifier …
Shared origin: SubExpression -> · list<value> NamedArgs
  via list<value> -> value · value+
    via value+ -> · value
      via value -> · PropertyPath
        PropertyPath -> · identifier ("." identifier)+

How do I tell the parser that all identifier.identifier.identifier (recursively?) is a PropertyPath?

The thing about LR grammars is that you can’t really isolate bits like that — these rules are probably being used in a context where they may be followed by a “.” token, and the conflict arises because it is unclear whether to continue parsing a PropertyPath or to use that outer rule.

I would suggest you spend some time reading up on LR parsers and how they work. It isn’t realistic for me to help you through every problem you run into here.

(post deleted by author)

i’m no expert, but i think what you might need is a parsing algorithm like the markdown one. as you can see, it’s not defined with a .grammar file, because it’s just… not possible due to its complexity (i.e., the need to look ahead like you described).
while browsing this forum earlier, i found a thread confirming that markup languages are either super difficult or downright impossible with lezer grammar.
though from what i can gather, you’re not making a markup language (lucky you :P), i think it’s worth a shot. the more hoops you have to jump through with .grammar files, the more you have to wonder if it’s the right way to go about things.
or… maybe all you need is an external tokenizer / specializer. it worked for me for a bit until i ended up needing to create a makeshift state machine that was simply an array with elements unshift()ed into it to see which token was most recently accepted hah.
sorry that i don’t have any definitive answers, i’m still trying to comprehend it myself.

1 Like

uh oh. I am tho. I will have to highlight stuff like Glimmer w/ Highlight.js

Glimmer is an extension on top of HTML

Some tree-sitter examples:

yeah, I just know that tree-sitter was a fair bit easier to do all this in :upside_down_face:, and I could use their grammar format for everything. no need to write custom JS for anything.

all good! we’ll figure this system out!

highlight.js was also kinda difficult, but had great support.
tree-sitter had no support, but had plenty of examples and good error messages that helped out.

After I finish Glimmer in Lezer, I have to figure out:

  • JS + Glimmer
  • TS + Glimmer
  • Markdown + Glimmer

(for which I’m hoping the overlay / multi-language features work out pretty well for)

That looks a lot more structured (and LR-parseable) than the kind of stuff people tend to mean by ‘markup language’ (Markdown, ASCIIdoc, and similar monstrosities).

Most tree-sitter grammars I’ve looked at had custom tokenizers written in C, but okay. Also, it uses the exact same parsing technique (LR with opt-in GLR), with the same type of tricky shift/reduce conflict errors, so I’m not sure how it was so much easier for you. In any case, if you have a working tree-sitter grammar, translating it to Lezer should be really easy.

1 Like