Matching odd number of characters at the beginning and the end of string

I have a requirement for highlighting “raw” strings where they can start with any number of odd double quotes and span multiple rows.

raw"test"  // 1 is valid
raw"""test""" /// 3 is valid
raw""""test"""" // 4 is invalid

raw"""  // 3 multi-line is valid
test
"""

I borrowed what I was able to find from JS parser but really struggling to figure out the matching. So here is what I have:

// skip
@skip {} {
  MultilineRawStringLiteral { rawTextBlockBoundry textBlockContent* rawTextBlockBoundry }
}

// statement
RawStringLiteral { LineStarter kw<"raw"> MultilineRawStringLiteral }

// tokens
rawTextBlockBoundry { '"'('""')*!["] }  // I suspect this is the issue
textBlockContent { "\n" | !["\n] textBlockContent? | '"' textBlockQuote | space }
textBlockQuote { !["\n] textBlockContent | "\n" | '"' textBlockQuote2 }
textBlockQuote2 { !["\n] textBlockContent | "\n" }

So this gets me something like this:

CleanShot 2022-08-03 at 17.02.58

but struggles with the single line:

CleanShot 2022-08-03 at 17.03.50

I have the better matching regex but it uses the negative lookbehind like so (?<!")("(?:"")*)(?!") and I am really struggling getting that into lezer.

Any help would be greatly appreciated!

This is something a strictly regular language cannot express, so you’re probably better off using an external tokenizer for the string tokens.

1 Like