String inside mixed language content

I’m working on a language syntax that looks very much like JavaScript’s string interpolation, but with a much more limited language inside:

outside plain text ${functionName('testvalue}3')} endingtext

To parse this, I have written a basic grammar based on the Twig parser from the mixed language example:

@top Template { (directive | Text)* }

directive {
  Insert
}

@skip {space} {
  Insert { "${" DirectiveContent "}" }
}

@tokens {
  Text { ![$] Text? | "$" (@eof | ![{] Text?) }
  space { @whitespace+ }
  @precedence { space DirectiveContent }
  "${" "}"
}

This works great except that the closing brace inside the inner string closes the directive. How should I approach making quotes inside the DirectiveContent hide the closing brace?

1 Like

This isn’t something that Lezer can do, really. You could include a limited JavaScript tokenizer in your outer language, but there’s no way to defer to another parser and then pick up the parsing again after that has finished an expression (the outer parser runs to completion before the inner parser gets involved).

Thanks for the clarification. It explains why I haven’t seen any non-trivial examples of mixed language parsing without an external tokenizer. I’ll look closer at what lezer-parser/html does for scriptTokens and work from there.

Although the original idea of using an external tokenizer is probably still the right approach for most people, I found that my language is simple enough I can include a minimal tokenizer in the outer language by expanding on the example in SrivishnuR’s question. It won’t fully support something like JavaScript (backticks don’t work right, for example), but I hope it will help others who are learning to use Lezer.

@top TemplateString { (Expression | StringContent)* }

@local tokens {
    ExpressionStart[closedBy=ExpressionEnd]{ "${" }
    @else StringContent
}

@precedence {
    Expression
    StringContent
}

@skip {} {
    Expression {
        ExpressionStart Interpolation ExpressionEnd
    }
    Interpolation {
        (InnerString | ScriptText)*
    }
}

@local tokens {
  // Get strings, but don't let them be terminated by escaped quotes
  InnerString { '"' (("\\" ![}] ) | ( !["] ))* '"' | "'" (("\\" ![}] ) | ( !['] ))* "'" }
  ExpressionEnd[openedBy=ExpressionStart] { "}" }
  @else ScriptText
}