I wanted to start writing grammars for Lezer, and I wanted to know whatβs the best way to test and visualize what the grammar can parse?
Thanks in advance
I wanted to start writing grammars for Lezer, and I wanted to know whatβs the best way to test and visualize what the grammar can parse?
Thanks in advance
Thereβs no way to visualize what a grammar can parse (though I suppose such a tool could be derived from the LR automaton). You can often get useful info by setting the LOG
environment variable when running the parser or the generator. The generator understands LOG=lr
to output a representation of its automaton. The parser will log parsing actions when you set LOG=parse
.
Whenever I develop a grammar I find it useful to pretty print the parse tree. I did not find anything for this in the API, so I ported some code I have used in other parser generators:
import * as _inspect from "browser-util-inspect"
import { Input, stringInput, Tree } from "lezer-tree"
function inspect(arg: any): string {
return _inspect(arg, {
depth: null,
colors: true,
})
}
function printTree(tree: Tree, input: Input | string, from = 0, to = input.length): string {
if (typeof input === "string") input = stringInput(input)
let out = ""
const c = tree.cursor()
const childPrefixes: string[] = []
for (;;) {
const { type } = c
const cfrom = c.from
const cto = c.to
let leave = false
if (cfrom <= to && cto >= from) {
if (!type.isAnonymous) {
leave = true
if (!type.isTop) {
out += "\n" + childPrefixes.join("")
if (c.nextSibling() && c.prevSibling()) {
out += " ββ "
childPrefixes.push(" β ")
} else {
out += " ββ "
childPrefixes.push(" ")
}
}
out += type.name
}
const isLeaf = !c.firstChild()
if (!type.isAnonymous) {
const hasRange = cfrom !== cto
out += ` ${hasRange ? `[${inspect(cfrom)}..${inspect(cto)}]` : inspect(cfrom)}`
if (isLeaf && hasRange) {
out += `: ${inspect(input.read(cfrom, cto))}`
}
}
if (!isLeaf || type.isTop) continue
}
for (;;) {
if (leave) childPrefixes.pop()
leave = c.type.isAnonymous
if (c.nextSibling()) break
if (!c.parent()) return out
leave = true
}
}
}
console.log(printTree(tree, input))
Example output would be:
Contents [0..8]
ββ TagName [0..1]: 't'
ββ Tag [1..7]
β ββ TagStart [1..2]: '{'
β ββ TagName [2..3]: 'a'
β ββ ContentsStart [3..4]: ':'
β ββ TagName [4..6]: 'es'
β ββ TagEnd [6..7]: '}'
ββ TagName [7..8]: 't'
I improved the pretty printer a bit:
start
(the start offset, useful when printing subtrees whilst still wanting it to report absolute positions) and includeParents
(if you use from
and/or to
, whether to include the parents in the subrange that is to be printed).import { Input, NodeType, stringInput, Tree, TreeCursor } from "lezer-tree"
enum Color {
Red = 31,
Green = 32,
Yellow = 33,
}
function colorize(value: any, color: number): string {
return "\u001b[" + color + "m" + String(value) + "\u001b[39m"
}
function focusedNode(
cursor: TreeCursor,
): { readonly type: NodeType; readonly from: number; readonly to: number } {
const { type, from, to } = cursor
return { type, from, to }
}
export function printTree(
tree: Tree,
input: Input | string,
options: { from?: number; to?: number; start?: number; includeParents?: boolean } = {},
): string {
const cursor = tree.cursor()
if (typeof input === "string") input = stringInput(input)
const { from = 0, to = input.length, start = 0, includeParents = false } = options
let output = ""
const prefixes: string[] = []
for (;;) {
const node = focusedNode(cursor)
let leave = false
if (node.from <= to && node.to >= from) {
const enter = !node.type.isAnonymous && (includeParents || (node.from >= from && node.to <= to))
if (enter) {
leave = true
const isTop = output === ""
if (!isTop || node.from > 0) {
output += (!isTop ? "\n" : "") + prefixes.join("")
const hasNextSibling = cursor.nextSibling() && cursor.prevSibling()
if (hasNextSibling) {
output += " ββ "
prefixes.push(" β ")
} else {
output += " ββ "
prefixes.push(" ")
}
}
output += node.type.isError ? colorize(node.type.name, Color.Red) : node.type.name
}
const isLeaf = !cursor.firstChild()
if (enter) {
const hasRange = node.from !== node.to
output +=
" " +
(hasRange
? "[" +
colorize(start + node.from, Color.Yellow) +
".." +
colorize(start + node.to, Color.Yellow) +
"]"
: colorize(start + node.from, Color.Yellow))
if (hasRange && isLeaf) {
output += ": " + colorize(JSON.stringify(input.read(node.from, node.to)), Color.Green)
}
}
if (!isLeaf) continue
}
for (;;) {
if (leave) prefixes.pop()
leave = cursor.type.isAnonymous
if (cursor.nextSibling()) break
if (!cursor.parent()) return output
leave = true
}
}
}
I got asked how to debug SyntaxNode
with my code, which is to get the underlying tree via node.cursor.tree
, though now I have updated it to just also accept TreeCursor
and SyntaxNode
. And I had a length bug causing the range to be out of bounds, which the current code does not account for (to
should default to Infinity
rather than input.length
to circumvent that). Rather than keep replying to this forum post with updates, I just made it into a public Gist: