I’m working on a grammar for Clojure and I would like to detect and highlight variable names. I’ve created a minimal grammar to illustrate the problem:
@top[name=Program] { expression* }
@skip { whitespace }
expression { Symbol | List }
List { "(" (DefLike VarName expression? | expression*) ")" }
VarName { Symbol }
@tokens {
"("
")"
whitespace { std.whitespace }
Symbol { std.asciiLetter+ }
}
DefLike { @extend<Symbol, "def" | "defn"> }
@detectDelim
A DefLike token may only appear at the beginning of a list, so I’m using both meanings.
Running this grammar agains the following tests produces one failure:
# Add
(hello world)
==>
Program(List(Symbol,Symbol))
# Def
(def foo bar)
==> Program(List(DefLike,VarName(Symbol),Symbol))
# Def Defn
(def defn foo)
==> Program(List(DefLike,VarName(Symbol),Symbol))
# Def Defn 2
(def defn foo bar)
==> Program(List(DefLike,VarName(Symbol),Symbol,Symbol))
Only the last test fails:
expression
✓ Add
✓ Def
✓ Def Defn
1) Def Defn 2
3 passing (7ms)
1 failing
1) expression
Def Defn 2:
Error: Expected DefLike in List, got Symbol at 1
Program(List("(",Symbol,Symbol,Symbol,Symbol,")"))
I found it suprising to see the last test fail, but the test before it pass. It seems that my grammar is not deterministic in what tree it produces. Notices that the only difference between the last two tests is the addition of a symbol at the end of the list.
Is there a better way to specify what I want without this ambiguity? I also tried using an external tokenizer and tried considering the stack to only consider a DefLike token at the beginning of a list but I couldn’t get that to work.
Also wondering if lezer-generator should output a warning in this case. I only discovered this problem after using lezer interactively as my grammar tests were actually passing.
A running version of this minimal grammar can be found at https://github.com/nextjournal/lezer-clojure/tree/minimal-def