Inline Nodes generate different ids

xixixao · May 7, 2021, 2:20pm

With grammar like:


BinaryExpression {
  expression !exp ArithOp { "^" } expression |
  expression !mult ArithOp { "*" | "/" | "%" } expression |
  expression !add ArithOp { "+" | "-" } expression |
}

The nodes returned for the operator will actually have different IDs (2, 3, 4), but parser.terms will only return the last ID (4). This makes matching on type.id behave differently from matching on type.name.

Is this expected?

marijn · May 7, 2021, 5:58pm

Yes, multiple terms can share a name. The parser needs to distinguish between them during parsing. If you need different names, assign different node names to the rules.

I can see how the value in parser.terms is confusing. I would have actually expected the tool to not generate a term binding at all for a parameterized rule. I’ll take a look at what’s happening later.

marijn · May 8, 2021, 8:12am

Yes, this was a bug in lezer-generator — it should only emit term bindings for top-level unique (non-inline, non-parameterized) rules, but it was accidentally doing it anyway for inline rules. This patch should fix that.

xixixao · May 9, 2021, 3:12am

It might be a good idea to document this, for folks (like me) who otherwise use the terms for matching or other purposes. I’d advise them to match by name in this case.