Issue with setting aliases to statements in GraphQL-like grammar

gnclmorais · October 14, 2024, 4:58pm

Hey folks! I’m building a grammar somewhat similar to GraphQL, but I’m running into an issue.

The core problem I’m having is allowing every statement to have an optional alias. Considering the following query sample:

query {
  AA(sourceA: "aModel")

  alias0: AA(sourceA: "aModel")

  alias1: AA(sourceA: "aModel")
  alias2: BB(sourceB: "something.else")
  alias3: CC {
    anotherThing
    oneMore: Thing
  }
}

AA, BB, and CC are operators (imagine like SLQ’s FIND, WHERE, and SELECT for example)
A query can have multiple AA statements
AA can be followed by BB and/or CC
Any statement can have an alias, but it’s not mandatory

I tried to express the above like this:

@precedence { aa, bb, cc }

@skip { space }

@top query { Query }

Query {
  'query' '{'
    Statement+
  '}'
}

Statement {
  !aa AA
  !bb BB*
  !cc CC?
}

AA {
  Alias? opAA '('
    'sourceA' ':' doubleQuoteString
  ')'
}

BB {
  Alias? opBB '('
    'sourceB' ':' doubleQuoteString
  ')'
}

CC {
  Alias? opCC '{'
    (identifier | identifier ':' identifier)+
  '}'
}

Alias {
  identifier ':'
}

@tokens {
  space { @whitespace+ }
  letter_ { @asciiLetter | '_' }
  identifier { letter_ (letter_ | @digit)* }
  doubleQuoteString { '"' (!["\\] | "\\" _)* '"' }

  opAA { 'AA' }
  opBB { 'BB' }
  opCC { 'CC' }

  @precedence {
    opAA, opBB, opCC, identifier
  }
}

However, the alias don’t seem to work well, to be properly idenfitied (screenshot from https://lezer-playground.vercel.app):

Any idea what I’m doing wrong?

marijn · October 14, 2024, 6:55pm

The syntax tree you show seems a reasonable match of the given grammar. I’m not sure what, precisely, you were expecting to work differently.

gnclmorais · October 14, 2024, 8:28pm

Right, I’m sure it does, but if you compare it with the query, it does not represent its structure correctly.

As you can see from the screenshot, the highlighted node is an Alias under AA, but in the query I’m actually highlighting the Alias of a CC.

In other words, there seems to be something on my Alias definition (or on the other definitions that use Alias) that is tripping up the parsing and the results are not what I expected.

marijn · October 14, 2024, 9:05pm

Oh, I see what you mean. I was thrown off by the way that tree display hides most of the error nodes. What’s happening here is that your grammar isn’t LR(1), because when seeing a label after an AA production, it’s not clear whether that starts a new statement (which may occur after each other without any separator) or it starts a BB. I guess that’s why you added the precedence rules, but those just hide the problem, causing the parser to always try to parse an AA whenever it sees a label.

If you cannot change the syntax to be less ambiguous, or the grammar structure to be more loose (by removing the Statement grouping and just parsing these as a sequence of AA | BB | CC you’d sidestep the ambiguity), you’ll have to use GRL parsing. Drop the ! markers, and put a ~alias marker at the end of Statement and at the start of Alias.

gnclmorais · October 15, 2024, 8:33am

Thank you so much, I could see the ambiguous nature of it but could not figure out how to unravel it…! Going with the ~alias marker solved my issue, it is now parsing exactly how I want it. Thanks again!

And thank you for CodeMirror and Lezer, I love these tools!