Mixed language parsing - empty content issue

haftav · November 7, 2023, 10:17pm

Hi everyone! I’m working on a modified JSON grammar that supports parsing Javascript between handlebars brackets. By following the mixed-language parsing example and referencing a number of other discussions, I’ve gotten something to work pretty well. The gist is that I’m using parseMixed to parse the content of HandlebarsContent nodes with the Javascript parser and mount that tree in place of the original node. Here’s a quick reference to the grammar, external tokenizer, and language definition, as well as a codesandbox link with a working example:

syntax.grammar

@top JsonText { value }

value { True | False | Null | Number | String | Object | Array | directive }

directive {
    Handlebars
}

Handlebars { HandlebarsOpen HandlebarsContent HandlebarsClose }

HandlebarsOpen {
    "{{"
}

HandlebarsContent { handlebarsText* }

@external tokens handlebarsTokens from "./tokens" {
    handlebarsText,
    HandlebarsClose
}

String { string }
Object { "{" list<Property>? "}" }
Array  { "[" list<value>? "]" }

Property { PropertyName ":" value }
PropertyName { string | Handlebars }

@tokens {
    True  { "true" }
    False { "false" }
    Null  { "null" }

    Number { '-'? int frac? exp?  }
    int  { '0' | $[1-9] @digit* }
    frac { '.' @digit+ }
    exp  { $[eE] $[+\-]? @digit+ }

    string { '"' char* '"' }
    char { $[\u{20}\u{21}\u{23}-\u{5b}\u{5d}-\u{10ffff}] | "\\" esc }
    esc  { $["\\\/bfnrt] | "u" hex hex hex hex }
    hex  { $[0-9a-fA-F] }

    whitespace { @whitespace+ }
    @precedence {whitespace}

    "{" "}" "[" "]"
    "{{" "}}"
    }

@skip { whitespace }

list<item> { item ("," item)* }

@external propSource jsonHighlighting from "./highlight"

@detectDelim

tokenizer

import { ExternalTokenizer } from "@lezer/lr";
import { handlebarsText, HandlebarsClose } from "./syntax.grammar.terms";

const closeTemplate = 125;

function expressionTokenizer() {
  return new ExternalTokenizer((input) => {
    let i = 0;
    let state = 0;
    let contentLength = 0;
    while (true) {
      if (input.next < 0) {
        if (i) input.acceptToken(handlebarsText);
        break;
      }
      // first close template
      if (state == 0 && input.next == closeTemplate) {
        state++;
      }
      // second close template
      else if (state == 1 && input.next == closeTemplate) {
        // if we have contentLength then accept that token
        if (contentLength) {
          input.acceptToken(handlebarsText, -contentLength);
        } else {
          input.acceptToken(HandlebarsClose, 1);
        }
        break;
      } else {
        // reset
        contentLength++;
        state = 0;
      }
      input.advance();
    }
  });
}

export const handlebarsTokens = expressionTokenizer();

language

export const JsonHandlebarsLanguage = LRLanguage.define({
  name: "json",
  parser: jsonParser.configure({
    wrap: parseMixed((node) => {
      return node.name === "HandlebarsContent"
        ? {
            parser: javascriptLanguage.parser.configure({
              top: "SingleExpression",
            }),
          }
        : null;
    }),
    props: [
      indentNodeProp.add({
        Object: continuedIndent({ except: /^\s*\}/ }),
        Array: continuedIndent({ except: /^\s*\]/ }),
      }),
      foldNodeProp.add({
        "Object Array": foldInside,
      }),
    ],
  }),
  languageData: {
    closeBrackets: { brackets: ["[", "{", '"'] },
    indentOnInput: /^\s*[\}\]]$/,
  },
});

codesandbox

As an example, here’s the resulting tree for a simple expression like ‘{{4}}’:

JsonText(
	Handlebars(
		HandlebarsOpen,
		SingleExpression(Number),
		HandlebarsClose
	)
)

I’ve verified that completion and syntax highlighting work as expected in most cases. However, there’s an issue I could use some help with. Ideally, if I start autocompletion from anywhere within the handlebars brackets, it should show Javascript completions. But when I add an empty Handlebars expression ‘{{}}’, and try to explicitly start autocompletion when the cursor is between the brackets, it appears the active language at that point in the document is still the parent language (not the wrapped language), so I don’t get shown completions for Javascript. If I print out the tree structure, I can see the following:

JsonText(
	Handlebars(
		HandlebarsOpen,
		HandlebarsContent,
		HandlebarsClose
	)
)

Based on my understanding of parseMixed, I would expect HandlebarsContent to be replaced with SingleExpression, even if the content of SingleExpression was empty. I would also expect the active language at this location to be Javascript, so I get shown Javascript completions. Are these assumptions correct? If not, is there anything I can do to achieve the desired behavior?

Thanks for your help in advance, and for this amazing library!

marijn · November 8, 2023, 4:35pm

There’s an invariant in the Lezer parser interface that, in the set of input ranges passed to a parser, there cannot be any empty ranges. That had the effect of ignoring zero-length nodes that matched parseMixed logic. But I guess, for non-overlay mounts, we could support this. See this patch.

haftav · November 10, 2023, 9:17pm

Thanks so much for your quick response, @marijn. I ran some tests with these updates and was able to verify that running the parser on ‘{{}}’ indeed resulted in the following:

JsonText(
	Handlebars(
		HandlebarsOpen,
		SingleExpression,
		HandlebarsClose
	)
)

However, I’m still unable to explicitly start autocompletion for the nested language when the cursor is between empty Handlebars brackets ‘{{}}’. It appears to still be using the language data for the parent.

Upon further investigation, I noticed that in the extension definition in the Language constructor, it fails to identify the top node as SingleExpression for empty handlebars expressions. For instance, in the expression ‘{{}}’ it identifies the top node as JsonText, but for the expression ‘{{c}}’ the top node is identified as SingleExpression.

The issue appears to be related to a checkSide call within the nextChild method of TreeNode:

  if (!checkSide(side, pos, start, start + next.length))
      continue;

When the content of the handlebars expression is empty, checkSide returns a falsy value (start and start + next.length are the same). Could this be updated to first check if there is a mounted tree? The following update resolved my issue, but I’m uncertain about potential ramifications from that change.

let mounted;

if (!(mode & IterMode.IgnoreMounts)) {
    mounted = MountedTree.get(next);
}

if (!checkSide(side, pos, start, start + next.length) && !mounted)
    continue;

marijn · November 11, 2023, 8:00am

I’m not quite sure how to read this. What does top node mean here? How is your autocompletion checking for this?

haftav · November 11, 2023, 2:48pm

I’m referring to the variable top defined here: https://github.com/codemirror/language/blob/main/src/language.ts#L91
My understanding is that this helps determine which language data fields to return when languageDataAt is called.

The autocompletion logic in my example isn’t doing anything special, it just adds the extension for JsonHandlebarsLanguage as well as the Javascript support.

const state = EditorState.create({
  doc: "",
  extensions: [
    autocompletion(),
    keymap.of(closeBracketsKeymap),
    closeBrackets(),
    new LanguageSupport(JsonHandlebarsLanguage, [
      JsonHandlebarsLanguage.data.of({
        autocomplete: [{ label: "parent completion" }],
      }),
    ]),
    javascript().support,
  ],
});

Sandbox link

marijn · November 12, 2023, 8:54am

I see what you mean now. SyntaxNode.enter won’t enter zero-length nodes, so Language.isActiveAt isn’t going to see this node. That’s unfortunate, but hard to avoid with position-based addressing — the brackets are also at this position in the document, so it’s hard to define a way to query this that actually knows you prefer the SingleExpression node. You may need to add some kludge that specifically handles this case (for example by resolving the node before the cursor and checking whether it is a HandlebarsOpen token).