How to extend lang-json with a tiny bit of new syntax?

Hi there,

I need to implement support for a custom language that is essentially JSON, but that also allows values enclosed in double braces. Here’s a few examples of expressions I need to support:

{
    a: "prefix-{{Foo.bar}}",
    b: {{Baz.someNumber}},
    c: {{Baz.someBoolean}}
}

I’d love to base it off the @codemirror/lang-json package, ideally without forking. Is there any way to extend lang-json to allow this syntax, or is it perhaps feasible to use mixed parsing and create a lightweight parser for the double-braced expressions?

This is my first time using Codemirror. My understanding of the mixed parsing mechanism is that it evaluates the outer parser first, which I suspect means the full text would need to be valid JSON before the inner parser is able to parse the double-brace expression, so I’m not sure if the mixed parsing route will work for me.

Open to any and all feedback. Thanks!

I think this would require forking the JSON grammar, because there’s no real outer structure for nested parsing to use. But that grammar is less than 40 lines, so it shouldn’t be too much of a maintenance burden.

1 Like

Thanks for the tip, marijn! I took your advice and started trying to copy lezer’s json grammar directly and iterate on it. However, I seem to be stuck at step 1. I’m trying to get the grammar to build with buildParser and am running into a runtime error:

"Unexpected character "\udbff""

here’s a reproduction of the error: demonstration of error from @lezer/generator's buildParser - StackBlitz

I’ve looked at the docs for buildParser and also looked at how it is used in some lezer tests but I am at a loss for why it won’t build.

I would be grateful for any assistance!

You’re not escaping your backslashes in your template string.

1 Like

Many thanks. I have made progress!

With my modified JSON grammar, I can parse text like {"prop": {{Foo.bar.baz}}}

I am now encountering a challenge that seems like it should be solvable, though I’m not sure how much I will need to modify in the original JSON grammar to do so:

Is it possible to reuse the same matching rules to parse the same pattern when it occurs within a string, i.e {"prop": "blabla-{{Foo.bar.baz}}-blabla"}

I have tried something like the following inside @tokens:

ContentType { "Foo" | "Bar" }
BarExp { "{{" ContentType '.' segments<PathSegment> "}}" }
segments<item> { item ('.' item)* }
string { '"' (BarExp | char)* '"' }
char { $[\u{20}\u{21}\u{23}-\u{5b}\u{5d}-\u{10ffff}] | "\\" esc }

My hope was that this string would match "blabla-{{Foo.bar.baz}}-blabla" but I don’t see any BarExp-related nodes in my result. I am unsure if this is a related to precedence (is char matching before BarExp ?)

Since "{{" is longer than char matching a {, the former should take precedence. String interpolation in similar grammars does work like this (though usually using a token that matches sequences of non-special character input rather than single ones, and making sure it doesn’t match the special character).

1 Like

This is considered an internal dialect of JSON. It’s fairly common, and 100% valid JSON.

I may need something like this that also works w/ YAML.