Use case
I am trying to write a simplified YAML
grammar (yes, I understand that simple and YAML don’t really mix ).
One element that I am finding to be tricky is dealing with multiline strings that essentially “capture” indented content
# | marks start of the multiline string
multi: |
line 1
line 2
# this newline at a lower indent level ends the multiline string
simple: value
What i’ve tried
I’ve gotten this to parse by using the tokenizer/context from indent example but there are parse errors and the dedent
tokens aren’t actually working.
Here’s a stackblitz example to see live.
The problem
For this input:
key: value_one
multi: |
line 1
line 2
key_two: hi
It generates a parse tree like so:
Doc (key: value_one\nmulti: |\n line 1\n line 2\nkey_two: hi)
Property (key: value_one\n)
Key (key:)
Value ( value_one)
Property (multi: |\n line 1\n line 2\n)
Key (multi:)
MultiLineExp ( |\n line 1\n line 2)
MultiLineKey ( |)
⚠ (\n)
Value (line 1)
⚠ (\n )
Value (line 2)
⚠ ()
Property (key_two: hi)
Key (key_two:)
Value ( h)
⚠ (i)
With errant Value
and \n
and
errors. I really want everything from MultiLineKey
to the next Property
to be a MultiLineExp
(which I think is what the dedent
is supposed to do?)
The issue seems to be Value
and the _
with the following declaration:
element {
Property { Key (Value | MultiLineExp) lineEnd }
}
MultiLineExp { MultiLineKey indent Value* (dedent | eof) }
Value { (![|#"\n] _)+ }
It seems like _
is somehow confusing the external tokenizer and no dedent
token is being emitted.
My Questions
- Is there a more natural way of expressing this multiline behavior that isn’t as reliant on an the external tokenizer?
- If a tokenizer is the only way forward, what is the best way to debug issues with the logic? The interplay of grammar and tokenizer is difficult for me to understand
More context
Here is more extended version of my grammar (see the stackblitz for the whole working thing)
// simplified, see demo for full example
// https://stackblitz.com/edit/js-az5b2u?file=index.js%3AL5
@top Doc { element* }
element {
Property { Key (Value | MultiLineExp) lineEnd }
}
MultiLineExp { MultiLineKey indent Value* (dedent | eof) }
lineEnd { newline | eof }
@context trackIndent from "./tokens.js"
@external tokens Indentation from "./tokens.js" {
indent
dedent
blankLineStart
}
@tokens {
// ...
Key { (@asciiLetter | "_")* ":" }
Value { (![|#"\n] _)+ }
// ...
}