Unexpected result: Using an escape character to escape the character

hi marijn!

I’m once again hoping for your input to point me in the right direction.
I have a grammar to define key-value pairs, with some characters that need to be escaped when used in a key / value. one of those is a “position” operator, used to match a value at the beginning / end of a key.

To define the escaped character / position operator, I’m using

escapedCharacter { "\\" _ }
PositionOperator { "*" }

the problem I have is that when I want to escape the escape character, the second escape character still escapes the position operator.
e.g.:

foo = \\
-> first backslash escapes the second - all good

foo = \\*
-> first backslash escapes the second,
but the second also escapes the asterisk,
which should actually be detected as position operator

do you have an idea how I could best approach that issue? (stripped down version of the grammar, see below)

@top Filter { expression* }

expression { FilterStatement | space }

FilterStatement { FilterKey space* FilterValue }

FilterKey { Identifier }
FilterValue { (PositionOperator? Identifier PositionOperator?) | Boolean }

Boolean { @specialize<Identifier, "true" | "false"> }

@tokens {
  space { @whitespace }
  escapedCharacter { "\\" _ }
  
  PositionOperator { "*" }
  Identifier { (escapedCharacter | ![ ,])* (![ *,)] | escapedCharacter) }

  @precedence { space, PositionOperator, Identifier }
}

I think the problem is in this. These aren’t regular expressions, where the first branch that matches is taken—these are nondeterministic automata, which conceptually take all matching branches at the same time. So you’ll want to include the backslash in the negative groups (![\\ ,]) to make sure that branch doesn’t get taken for backslashes.

1 Like

ah, that makes so much sense!
thx for the explanation, works like a charm!