`NodeType.isError` for a large and valid JSON document

transform · September 14, 2021, 7:09pm

From the support I received in Check JSON syntax using language pack - #5 by transform, I use the following to validate a JSON document.

const tree: Tree = syntaxTree(this.cm.state);
if (tree) {
  const cursor: TreeCursor = tree.cursor();
  while (cursor.next()) {
    const sn: SyntaxNode = cursor.node;
    const nt: NodeType = sn.type;
    if (nt.isError) {
      // Invalid JSON document.
    }
  }
}
// Valid JSON document.

When copy/pasting valid but large JSON document, around 350 (23,384 characters) lines long, the code above detects an invalid JSON document. NodeType.isError is true.

However, just placing the cursor at the end of the JSON document and inserting a space character, causes the code above to re-run and validation passes.

This problem does not occur with smaller JSON documents.

I have the SyntaxNode and NodeType but because they contain circular references I can’t just JSON.stringify them into this post.

Is this a known issue?
What can I do to debut this further?

marijn · September 15, 2021, 6:11am

Oh, that’s a good point. The parser will delay parsing when it takes too long, and insert error nodes at the end of any constructs that were unfinished at the point where it stopped. You may need to add a rule that when the tree doesn’t span the entire document, you ignore error nodes near its end.

transform · September 15, 2021, 4:08pm

Thanks @marijn

I can see how that would prevent falsely reporting an error, but how should a large JSON document be checked for correctness if error nodes must be ignored?
How can I check whether a tree spans the entire document?

marijn · September 15, 2021, 10:14pm

Compare syntaxTree(state).length to state.doc.length. You only need to ignore the errors near the end of the document.

transform · September 21, 2021, 7:26pm

I’m finally looking at implementing this and realise now that I don’t fully understand your recommendation.

Assuming the tree has length 16,334 and the JSON document has length 117,968. The first error node has .context with

start: 15,354
index: 1

So this error node occurs before the end of the tree which has a length of 16,334. This suggest I need to ignore errors near the end of the tree, not the end of the document?

marijn · September 21, 2021, 7:38pm

Yes, that should have said ‘near the end of the tree’ (if the tree is smaller than the document).

transform · September 21, 2021, 8:37pm

OK. So if the tree is smaller than the end of the document, ignore errors near the end of the tree.

But what defines “near” the end of the tree? Without a definition I may ignore real errors (although there’s always a possibility of that since there could be a real error near the end of the tree).

Question. Given the tree is smaller than the document, why would is it necessary to inject error nodes at all? Surely the optimization comes from the fact that not all of the document is represented in the tree?

marijn · September 21, 2021, 9:21pm

To indicate that they are not full nodes, but have been cut short artificially.

transform · September 21, 2021, 11:20pm

Thanks. And on the other question; what defines “near” the end of the tree? Without a definition I could ignore real errors.