This question concerns the same grammar as described in this post, namely the criticmarkup syntax. The current implementation of the syntax can be found in the details below.
A quick description of how the syntax works:
- There are five type of markup: Addition, Deletion, Substitution, Comment and Highlight
- This markup is defined by following characters:
{++ ... ++}
,{-- ... --}
,{~~ ... ~> ... ~~}
,{>> ... <<}
,{== ... ==}
respectively - Markup inside other markup (nested markup), should not get parsed, i.e:
{++ {-- text --} ++}
should get parsed as Addition({-- text --}
) - The markup is used together with regular Markdown syntax (however, for my ViewPlugin, I only care about parsing the CriticMarkup syntax; I do not need to know what its contents are either)
Grammar
@detectDelim
@top CriticMarkup { (content|expression)* }
expression {
Addition |
Deletion |
Substitution |
Comment |
Highlight
}
@skip { } {
Addition { lAdd content? rAdd }
Deletion { lDel content? rDel }
Substitution { lSub content? MSub content? rSub }
Comment { lCom content? rCom }
Highlight { lHig content? rHig }
}
@local tokens {
lAdd { "{++" }
rAdd { "++}" }
lDel { "{--" }
rDel { "--}" }
lSub { "{~~" }
MSub { "~>" }
rSub { "~~}" }
lCom { "{>>" }
rCom { "<<}" }
lHig { "{==" }
rHig { "==}" }
@else content
}
@precedence {
Addition,
Deletion,
Substitution,
Comment,
Highlight
}
Working examples
{++This is an addition++}
{++It works properly
across multipe lines++}
**Regular markdown** can also appear between the text
{-- A deletion --}
{~~ A substitution ~> to this ~~}
{>>A comment node<<}
{==Finally, a highlight==}
Issues
1. Nested markup
Consider the example below:
{++ {--text--} ++}
For my implementation, I’d expect to have this parsed as an Addition
node with contents {--text--}
and that the Deletion
rule would not included in the output – in general: I do not want to allow nested nodes.
However, the parse output tree gives the following:
CriticMarkup(Addition(),Deletion,
)
name | from | to | content |
---|---|---|---|
Addition | 1 | 5 | {++ |
![]() |
5 | 5 | |
Deletion | 5 | 15 | {–text–} |
![]() |
16 | 19 | ++} |
This makes sense, since I’m currently specifying that there will be only content
(i.e.: non-tokens) between the markup brackets. So to solve that, I figured it should be as simple as also allowing tokens to exist between the brackets; however, no matter what I approach I took, I kept getting either S/R and R/R conflicts, or an error mentioning: Tokens from a local token group used together with other token
Is this a precedence/ambiguity issue, and thus a matter of correctly formulating the rule, or is it just not possible to implement this specific syntax the way I envisioned using local token groups?
2. Incorrect matching
Input:
{++ text --}
Output:
CriticMarkup(Addition())
name | from | to | content |
---|---|---|---|
Addition | 1 | 13 | {++ text --} |
![]() |
10 | 14 | –} |
Sadly, this one I understand even less. Why does the parser match the Addition node, despite the fact that it has not encountered the rAdd
token, as described in the rule: Addition { lAdd content? rAdd }
?
I apologise if my explanations were unclear, I have only recently started dabbling with parsers again, and I’m still trying to re-learn how grammars should be constructed; it’s highly likely that I’m making one (or many) rookie mistakes here.
Many thanks in advance!