Shortening the parse tree

sfc-gh-nlele · September 15, 2021, 6:16pm

Hi, we’re working on the expression definitions for a PostgreSQL esque grammar. While our expressions do get parsed, the trees start to become huge as the expression gets bigger, making the traversal harder.

For a query like select 1+2; below is the parse tree.

We have to port the expression definitions from an Antlr grammar, which has nested definitions for expressions similar to this.

Here is the actual snippet of the Lezer grammar.

Is there anything we could do to optimise this tree? Or only get selective nodes like PlusExpression, etc. that are more relevant to the actual expression, by hiding its irrelevant parents?

marijn · September 15, 2021, 10:16pm

Looks like you’re using a precedence-less context-free grammar that always goes through a huge stack of nested rules. Take a look at any of the other grammars that implement programming languages—they don’t have this problem, because they put all expression types at a single level and use precedences to disambiguate.