What's the best to test and debug grammars?

PabloMayobre · July 26, 2020, 4:47pm

I wanted to start writing grammars for Lezer, and I wanted to know what’s the best way to test and visualize what the grammar can parse?

Thanks in advance

marijn · July 26, 2020, 6:01pm

There’s no way to visualize what a grammar can parse (though I suppose such a tool could be derived from the LR automaton). You can often get useful info by setting the LOG environment variable when running the parser or the generator. The generator understands LOG=lr to output a representation of its automaton. The parser will log parsing actions when you set LOG=parse.

grayen · January 19, 2021, 12:46am

Whenever I develop a grammar I find it useful to pretty print the parse tree. I did not find anything for this in the API, so I ported some code I have used in other parser generators:

import * as _inspect from "browser-util-inspect"
import { Input, stringInput, Tree } from "lezer-tree"
function inspect(arg: any): string {
  return _inspect(arg, {
    depth: null,
    colors: true,
  })
}
function printTree(tree: Tree, input: Input | string, from = 0, to = input.length): string {
  if (typeof input === "string") input = stringInput(input)
  let out = ""
  const c = tree.cursor()
  const childPrefixes: string[] = []
  for (;;) {
    const { type } = c
    const cfrom = c.from
    const cto = c.to
    let leave = false
    if (cfrom <= to && cto >= from) {
      if (!type.isAnonymous) {
        leave = true
        if (!type.isTop) {
          out += "\n" + childPrefixes.join("")
          if (c.nextSibling() && c.prevSibling()) {
            out += " ├─ "
            childPrefixes.push(" │  ")
          } else {
            out += " └─ "
            childPrefixes.push("    ")
          }
        }
        out += type.name
      }
      const isLeaf = !c.firstChild()
      if (!type.isAnonymous) {
        const hasRange = cfrom !== cto
        out += ` ${hasRange ? `[${inspect(cfrom)}..${inspect(cto)}]` : inspect(cfrom)}`
        if (isLeaf && hasRange) {
          out += `: ${inspect(input.read(cfrom, cto))}`
        }
      }
      if (!isLeaf || type.isTop) continue
    }
    for (;;) {
      if (leave) childPrefixes.pop()
      leave = c.type.isAnonymous
      if (c.nextSibling()) break
      if (!c.parent()) return out
      leave = true
    }
  }
}
console.log(printTree(tree, input))

Example output would be:

Contents [0..8]
 ├─ TagName [0..1]: 't'
 ├─ Tag [1..7]
 │   ├─ TagStart [1..2]: '{'
 │   ├─ TagName [2..3]: 'a'
 │   ├─ ContentsStart [3..4]: ':'
 │   ├─ TagName [4..6]: 'es'
 │   └─ TagEnd [6..7]: '}'
 └─ TagName [7..8]: 't'

grayen · February 18, 2021, 1:22pm

I improved the pretty printer a bit:

No more external dependencies outside Lezer.
No longer requires the first node to be marked as a top node type (for example, when developing it might not yet be marked as such).
Added a start (the start offset, useful when printing subtrees whilst still wanting it to report absolute positions) and includeParents (if you use from and/or to, whether to include the parents in the subrange that is to be printed).

import { Input, NodeType, stringInput, Tree, TreeCursor } from "lezer-tree"

enum Color {
  Red = 31,
  Green = 32,
  Yellow = 33,
}

function colorize(value: any, color: number): string {
  return "\u001b[" + color + "m" + String(value) + "\u001b[39m"
}

function focusedNode(
  cursor: TreeCursor,
): { readonly type: NodeType; readonly from: number; readonly to: number } {
  const { type, from, to } = cursor
  return { type, from, to }
}

export function printTree(
  tree: Tree,
  input: Input | string,
  options: { from?: number; to?: number; start?: number; includeParents?: boolean } = {},
): string {
  const cursor = tree.cursor()
  if (typeof input === "string") input = stringInput(input)
  const { from = 0, to = input.length, start = 0, includeParents = false } = options
  let output = ""
  const prefixes: string[] = []
  for (;;) {
    const node = focusedNode(cursor)
    let leave = false
    if (node.from <= to && node.to >= from) {
      const enter = !node.type.isAnonymous && (includeParents || (node.from >= from && node.to <= to))
      if (enter) {
        leave = true
        const isTop = output === ""
        if (!isTop || node.from > 0) {
          output += (!isTop ? "\n" : "") + prefixes.join("")
          const hasNextSibling = cursor.nextSibling() && cursor.prevSibling()
          if (hasNextSibling) {
            output += " ├─ "
            prefixes.push(" │  ")
          } else {
            output += " └─ "
            prefixes.push("    ")
          }
        }
        output += node.type.isError ? colorize(node.type.name, Color.Red) : node.type.name
      }
      const isLeaf = !cursor.firstChild()
      if (enter) {
        const hasRange = node.from !== node.to
        output +=
          " " +
          (hasRange
            ? "[" +
              colorize(start + node.from, Color.Yellow) +
              ".." +
              colorize(start + node.to, Color.Yellow) +
              "]"
            : colorize(start + node.from, Color.Yellow))
        if (hasRange && isLeaf) {
          output += ": " + colorize(JSON.stringify(input.read(node.from, node.to)), Color.Green)
        }
      }
      if (!isLeaf) continue
    }
    for (;;) {
      if (leave) prefixes.pop()
      leave = cursor.type.isAnonymous
      if (cursor.nextSibling()) break
      if (!cursor.parent()) return output
      leave = true
    }
  }
}

grayen · May 19, 2021, 12:17pm

I got asked how to debug SyntaxNode with my code, which is to get the underlying tree via node.cursor.tree, though now I have updated it to just also accept TreeCursor and SyntaxNode. And I had a length bug causing the range to be out of bounds, which the current code does not account for (to should default to Infinity rather than input.length to circumvent that). Rather than keep replying to this forum post with updates, I just made it into a public Gist:

gist.github.com

https://gist.github.com/msteen/e4828fbf25d6efef73576fc43ac479d2

print-lezer-tree.ts

import { Input, NodeType, stringInput, SyntaxNode, Tree, TreeCursor } from "lezer-tree"

enum Color {
  Red = 31,
  Green = 32,
  Yellow = 33,
}

function colorize(value: any, color: number): string {
  return "\u001b[" + color + "m" + String(value) + "\u001b[39m"

This file has been truncated. show original