(key: value)* parser, errors when using different token declaration

cristipp · October 20, 2021, 3:20am

I implemented a very simple (key: value)* parser. I feed it a very simple string, "a: foo b: bar c: baz". If I use the same token declaration for keys and values (Key) all works fine. If I use different token declarations (Key vs Value), I get 6 syntax errors ( name), at positions 2, 6, 9, 13, 16, 20. I do not understand why the difference, the specifications are identical between the two tokens. Any help will be greatly appreciated.

Edit: Removing the @precedence line fixes the parse. Is there a way to build this parser with explicit precedence for Key and Value?

// const text = "a: foo b: bar c: baz";

@top Root { Pair* }

@tokens {
    @precedence { Value, Key }

    whitespace { std.whitespace+ }

    Key { std.asciiLetter+ }
    Value { std.asciiLetter+ }
}

@skip { whitespace }

// Pair { Key ":" Key }
Pair { Key ":" Value }

marijn · October 20, 2021, 7:31am

The problem is the @precedence declaration — that tells the parser generator that when both match, it should always pick Value. If you remove it, it’ll properly treat the token as contextual, and parse Key or Value depending on the parser state.

cristipp · October 20, 2021, 7:54am

Perhaps I’m missing something, but aren’t Key and Value activating in mutually exclusive contexts?

. a : foo => Key
a : . foo => Value

Documentation seems to imply that @precedence only applies in ambiguous contexts, see Lezer System Guide : "tokens are only allowed to overlap (match some prefix of each other) when they do not occur in the same place of the grammar ".

marijn · October 20, 2021, 7:33pm

That should be read as ‘are only allowed to overlap without an explicit precedence declaration’. Assigning a precedence removes the tokens from the grouping mechanism, since they are assumed to not conflict.