I’ve written a parser for a bespoke markup language. We allow arbitrary text with tokens for insertion values. Tokens are demarcated with double curly brackets. To avoid confusion I’ll refer to these tokens as ‘ReportTokens’ from now on. For example, we might have something like:
“Congratulations. Your score is {{score:XYZ||Sten}} which means you are {{content:XYZ||Interpretive Statement||{{bandedScore:3}}}}.”
The only parts of the text above I’m interested in, are the Report Tokens which are {{score:XYZ||Sten}} and {{content:XYZ||Interpretive Statement||{{bandedScore:3}}}}
The reportTokens can get quite complex, with nesting allowed. In the grammar, I would like to be able to ignore everything except the reportTokens. This is what I have at the moment:
@top String { item* }
item { text | ReportToken }
ReportToken { "{{" Domain ":" attribute demoValue "}}" }
attribute { AttributeSegment | AttributeSegment ("||" AttributeSegment)* }
AttributeSegment { SimpleAttributeSegment | ReportToken }
demoValue { "" | ("**" ExplicitDemoValue) | ("***" PredefinedDemoValue) }
@tokens {
Domain { (@asciiLetter)+ }
SimpleAttributeSegment { (@asciiLetter | @digit | $[_\- :."+=])+ }
ExplicitDemoValue { (@asciiLetter | @digit | $[_\- :."+=])+ }
PredefinedDemoValue { (@asciiLetter)+ }
text { ![{}]+ }
"{{" "}}" ":" "||" "**" "***"
}
@detectDelim
There are two problems with the grammar above:
- It doesn’t allow single curly brackets within ‘text’.
- I would like to allow single curly brackets inside ‘SimpleAttributeSegment’.