How do I parse part of a document?

I am trying to create a parser for a language similar to handlebars, glimmer.

Here is what I have so far

@top Glimmer { Expression* | Text* }

Expression { moustache<SubExpression> | Block | BlockComment }

Block { StartBlock (Expression | Text)* EndBlock }
StartBlock[closedBy="EndBlock"] { "{{#" Function SubExpression? As? "}}" }
As { kw<"as"> "|" list<name> "|" }
EndBlock[openedBy="StartBlock"] { "{{/" name "}}" }

Function { if | let | each | name }

SubExpression { list<Value> NamedArgs? }
Value {
  boolean
  | null
  | undefined
  | String
  | Number
  | Invocation
  | PropertyPath
}

Invocation { "(" SubExpression ")" }

Pair { string "=" Value }
NamedArgs { list<Pair> }

Argument { "@" name }
Property { this | Argument | name }
PropertyPath { Property ("." name)}

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

if { kw<"if"> }
let { kw<"let"> }
each { kw<"each"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }

ShortComment { "{{!" Text* "}}" }
LongComment { "{{!--" Text* "--}}" }
BlockComment { LongComment | ShortComment }

String { string }
Number { number }

@tokens {
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  identifier { word }
  name { word }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  Text { _+ }

  @precedence {
    Text,
    "{{#", "{{/",
    "{{", "}}", "{{!",
    "{{!--", "--}}",
    name,
    identifier
  }
}


// Helper And Special Things
list<item> { item (" " item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }
moustache<content> { "{{" content "}}" }


@external propSource glimmerHighlighting from './highlight'

but unfortunately, everything is identifies as Text, rather than anything more specific, even though Iā€™m using @precedence :thinking:

Whatā€™s going on? and how do I make all the other identifying bits work?

Example tests Iā€™m using

# Long comment

{{!-- text here --}}

==>

Glimmer(
  BlockComment
)
# Simple Integer

{{42}}

==>

Glimmer(
  Expression(
    SubExpression(
      Value(Number)
    )
  )
)

Changing to this fixed some of my tests, but my comments are erroring with

ā€œNo Parse at {some number}ā€

and I have no idea what that number corresponds to :upside_down_face:

@top Glimmer { (Expression)+ }

Expression { moustache<SubExpression> | Block | BlockComment }

Block {
  StartBlock
  (Expression | Text)*
  EndBlock
}
As { kw<"as"> "|" list<name> "|" }

StartBlock[closedBy="EndBlock"] {
  "{{#" Function SubExpression? As? "}}"
}
EndBlock[openedBy="StartBlock"] {
  "{{/" Function "}}"
}

Function { if | let | each | else | name }

SubExpression { list<Value> NamedArgs? }
Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | PropertyPath
}

Invocation { "(" SubExpression ")" }

Pair { string "=" Value }
NamedArgs { list<Pair> }

Argument { "@" name }
Property { this | Argument | name }
PropertyPath { Property ("." name)}

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

let { kw<"let"> }
each { kw<"each"> }
if { kw<"if"> }
else { kw<"else"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }


ShortComment { "{{!" (Text*) "}}" }
LongComment { "{{!--" (Text*) "--}}" }
BlockComment { LongComment | ShortComment }

String { string }
Number { number }

@precedence { Function, else, if, elseIf, Expression }

@tokens {
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  Text { _+ }

  @precedence {
    Text,
    "{{#", "{{/", "|", " ",
    "{{", "}}", "{{!",
    "{{!--", "--}}",
    identifier,
    name
  }
}


// Helper And Special Things
list<item> { item (" " item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }
moustache<content> { "{{" content "}}" }


@external propSource glimmerHighlighting from './highlight'


Here I got explicit with start / end relationships

@top Glimmer { ( Expression | BlockComment )* }
@skip { invisibles | Text }
@skip {} {
  String { string }

  ShortComment { StartShortComment EndStache }
  LongComment { StartLongComment EndLongComment }
  BlockComment { LongComment | ShortComment }
}

Expression { StartStache SubExpression EndStache | Block }

Block {
  StartBlock
  Expression*
  EndBlock
}
As { kw<"as"> "|" list<name> "|" }

StartBlock[closedBy=EndBlock] {
  StartOpenBlockStache Function SubExpression? As? EndStache
}
EndBlock[openedBy=StartBlock] {
  StartCloseBlockStache Function EndStache
}

Function { if | let | each | else | name }

SubExpression { list<Value> NamedArgs? }
Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | PropertyPath
}

Invocation { SubExpStart SubExpression SubExpEnd }

Pair { string "=" Value }
NamedArgs { list<Pair> }

Argument { "@" name }
Property { this | Argument | name }
PropertyPath { Property ("." name)}

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

let { kw<"let"> }
each { kw<"each"> }
if { kw<"if"> }
else { kw<"else"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }


Number { number }

@tokens {
  invisibles { @whitespace+ }
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }

  StartOpenBlockStache[closedBy="EndStache"] { "{{#" }
  StartCloseBlockStache[closedBy="EndStache"] { "{{/" }
  StartStache[closedBy="EndStache"] { "{{" }
  EndStache[openedBy="StartStache | StartShortComment | StartOpenBlockStache | StartCloseBlockStache"] { "}}" }

  StartShortComment[closedBy="EndStache"] { "{{!" }
  StartLongComment[closedBy="EndLongComment"] { "{{!--" }
  EndLongComment[openedBy="StartLongComment"] { "--}}" }

  SubExpStart[closedBy="SubExpEnd"] { "(" }
  SubExpEnd[openedBy="SubExpStart"] { ")" }


  Text { _+ }

  @precedence { identifier, Text }
  @precedence { name, Text }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  @precedence {
    identifier,
    name,
    ".", "@", "=",
    "|", " ",
    SubExpStart, SubExpEnd,
    StartLongComment, EndLongComment,
    StartShortComment, StartOpenBlockStache, StartCloseBlockStache,
    StartStache, EndStache,
    invisibles,
    string,
    number,
    Text
  }
}


// Helper And Special Things
list<item> { item (" " item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }


@external propSource glimmerHighlighting from './highlight'

but everything is text again

If I then remove Text entirely (which I donā€™t really want to do ā€“ bunch of parsing fails without Text), I get weird behavior:

{{'Some "inner" String'}}

==>

Glimmer(Expression(StartStache,SubExpression(Value(String)),EndStache))

StartStache and EndStache now show up in the test output

The character position in the input.

Your Text rule matches any number of any character, so once it starts matching that, itā€™ll match the entire document. Token precedence only takes effect when both tokens start at the same position.

@marijn thanks for helping on on my posts.

This is the main thread for my issues, I suppose.
I have this currently:

@detectDelim
@top Glimmer { ( Expression | BlockComment )* }

@skip { Text }
@skip { } {
  String { string }

  ShortComment { StartShortComment Text* EndStache }
  LongComment { StartLongComment Text* EndLongComment }
  BlockComment { LongComment | ShortComment }

}

StartOpenBlockStache[closedBy="EndStache"] { "{{#" }
StartCloseBlockStache[closedBy="EndStache"] { "{{/" }
StartStache[closedBy="EndStache"] { "{{" }
EndStache[openedBy="StartStache | StartShortComment | StartOpenBlockStache | StartCloseBlockStache"] { "}}" }

StartShortComment[closedBy="EndStache"] { "{{!" }
StartLongComment[closedBy="EndLongComment"] { "{{!--" }
EndLongComment[openedBy="StartLongComment"] { "--}}" }

SubExpStart[closedBy="SubExpEnd"] { "(" }
SubExpEnd[openedBy="SubExpStart"] { ")" }

Expression {
  StartStache SubExpression EndStache
  | Block
}

Block {
  StartBlock
  Expression*
  EndBlock
}
As { kw<"as"> "|" list<name> "|" }

StartBlock[closedBy=EndBlock] {
  StartOpenBlockStache Function invisibles SubExpression? As? EndStache
}
EndBlock[openedBy=StartBlock] {
  StartCloseBlockStache Function EndStache
}

Function { if | let | each | else | name }

SubExpression {
  list<Value>
  NamedArgs?
}


Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | Property
  | PropertyPath
}

Invocation { invisibles? SubExpStart SubExpression SubExpEnd invisibles? }

Pair { string "=" Value }
NamedArgs { list<Pair> }

@precedence { Invocation @left, NamedArgs @left, Pair @left, Value @left }

Argument { "@" identifier }
Property { this | Argument | identifier }
PropertyPath { Property ("." identifier)+ }

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

let { kw<"let"> }
each { kw<"each"> }
if { kw<"if"> }
else { kw<"else"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }


Number { number }



@tokens {
  invisibles { @whitespace+ }
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }


  Text { (@asciiLetter | invisibles | @digit)+ }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  @precedence {
    identifier,
    name,
    ".", "=",
    "|",
    /* SubExpStart, SubExpEnd, */
    /* StartLongComment, EndLongComment, */
    /* StartShortComment, StartOpenBlockStache, StartCloseBlockStache, */
    /* StartStache, EndStache, */
    invisibles,
    string,
    number,
    Text
  }
}


// Helper And Special Things
list<item> { item (invisibles item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }


@external propSource glimmerHighlighting from './highlight'


and this has me making progress, but
(right now it has a build error)

My current issues

How do you handle free-form text?

my Text token is now incorrect (itā€™s now over constrained ā€“ it needs to allow anything that isnā€™t {{ )

Does all text, regardless of character need to be a parseable node?
Can it be ignored?
(yet still allowing tokens to be defined / not become just ā€œTextā€)

Nearly empty Examples fail?

# Long - empty

{{!-- --}}

==>

Glimmer(
  BlockComment(
    LongComment(
      StartLongComment,
      Text,
      EndLongComment
    )
  )
)

This test fails, with No parse at 5 ā€“ why?
(probably because I have no idea how to handle invisibles)

You can write a token that matches anything up to the next {{. Something like (in your @tokens block):

  PlainText { ![{] PlainText? | "{" (@eof | ![{] PlainText?) }

Perhaps, but I how do I manage invisibles? right now I have this error:

> lezer-generator src/glimmer.grammar -o src/parser && rollup -c

shift/reduce conflict between
  Invocation -> SubExpStart SubExpression SubExpEnd Ā· invisibles
and
  Invocation -> SubExpStart SubExpression SubExpEnd
With input:
  StartStache SubExpStart SubExpression SubExpEnd Ā· invisibles ā€¦
Shared origin: Value -> Ā· Invocation
@detectDelim
@top Glimmer { ( Expression | BlockComment )* }

String { string }

ShortComment { StartShortComment Text* EndStache }
LongComment { StartLongComment Text* EndLongComment }
BlockComment { LongComment | ShortComment }


Expression {
  StartStache SubExpression EndStache
  | Block
}

Block {
  StartBlock
  Expression*
  EndBlock
}
As { kw<"as"> "|" list<name> "|" }

StartBlock[closedBy=EndBlock] {
  StartOpenBlockStache Function invisibles SubExpression? As? EndStache
}
EndBlock[openedBy=StartBlock] {
  StartCloseBlockStache Function EndStache
}

Function { if | let | each | else | name }

SubExpression {
  list<Value>
  NamedArgs?
}


Value {
  boolean
  | null
  | undefined
  | Function
  | String
  | Number
  | Invocation
  | Property
  | PropertyPath
}

Invocation { invisibles? SubExpStart SubExpression SubExpEnd invisibles? }

Pair { string "=" Value }
NamedArgs { list<Pair> }

@precedence { Invocation @left, NamedArgs @left, Pair @left, Value @left }

Argument { "@" identifier }
Property { this | Argument | identifier }
PropertyPath { Property ("." identifier)+ }

boolean {
  @specialize[@name=BooleanLiteral]<identifier, "true" | "false">
}

let { kw<"let"> }
each { kw<"each"> }
if { kw<"if"> }
else { kw<"else"> }

this { kw<"this"> }
null { kw<"null"> }
undefined { kw<"undefined"> }


Number { number }



@tokens {
  invisibles { @whitespace+ }
  number { @digit+ | (@digit+ ("." @digit*))}

  identifierChar { @asciiLetter | $[_$\u{a1}-\u{10ffff}] }
  word { identifierChar (identifierChar | @digit)* }
  name { word }
  identifier { word }


  Text { ![{] Text? | "{" (@eof | ![{] Text?) }

  string {
   "\"" !["]* "\"" |
    "\'" ![']* "\'"
  }

  StartOpenBlockStache[closedBy="EndStache"] { "{{#" }
  StartCloseBlockStache[closedBy="EndStache"] { "{{/" }
  StartStache[closedBy="EndStache"] { "{{" }
  EndStache[openedBy="StartStache | StartShortComment | StartOpenBlockStache | StartCloseBlockStache"] { "}}" }

  StartShortComment[closedBy="EndStache"] { "{{!" }
  StartLongComment[closedBy="EndLongComment"] { "{{!--" }
  EndLongComment[openedBy="StartLongComment"] { "--}}" }

  SubExpStart[closedBy="SubExpEnd"] { "(" }
  SubExpEnd[openedBy="SubExpStart"] { ")" }


  @precedence {
    identifier,
    name,
    ".", "=",
    "|", "@",
    SubExpStart, SubExpEnd,
    StartLongComment, EndLongComment,
    StartShortComment, StartOpenBlockStache, StartCloseBlockStache,
    StartStache, EndStache,
    invisibles,
    string,
    number,
    Text
  }
}


// Helper And Special Things
list<item> { item (invisibles item)* }
kw<term> { @specialize[@name={term}]<identifier, term> }


@external propSource glimmerHighlighting from './highlight'

Didnā€™t I just show you how to write a token that doesnā€™t have this problem?

I donā€™t know what ā€˜invisiblesā€™ means to you. And thereā€™s no need to paste your grammar in every messageā€”while Iā€™m willing to answer specific questions, I donā€™t have time to debug entire grammars.

Yeah, youā€™re quoting an old statement.

@tokens {
  invisibles { @whitespace+ }

I donā€™t yet know enough about this very new and non widely used grammar-syntax, nor the nomenclature around it to ask more specific questions. Still trying to build a foundation ā€“ Iā€™ve read through the docs, but because I lack foundational understanding on all these topics, they donā€™t exactly read clearly to me.

1 Like