add nested lang to lang-html

hey all-

is there any example of adding a new nested language to lang-html keeping everything else the same so that js can be parsed with new start/end characters?

id like to add a nested js parser for when ${ …js here … } is typed but i’m struggling with the way to accomplish that without manually recreating lang-html.

More complete example of what I’m trying to do. Given the following doc:

<p>some text in a p tag</p>
${ [1, 2, 3].map(() => {
    return html`<p>looped p tag in a js expression</p>`;
  })
}
<div>more html</div>

I’d like to be able to configure the js expression and its contents parsed as js, and the rest parsed as HTML. Currently lang-html only nests js parsing between <script> tags.

There is no way to inject new syntax into a parser like that. One thing you could do is write an outer language that just divides the document into HTML and JavaScript regions, and then use mixed parsing to parse those with the proper parser.

2 Likes

thanks for the response!

could i take the parts provided by lang-html maybe and re-assemble them? i noticed that lang-html doesn’t export everything it configures. would a PR to export the nested lang config and such from lang-html be amenable so that i could build my own language but not have to manually copy things from lang-html?

it seems like maybe forking lang-html and just changing up the nested language config would be the easiest?

The nested language config on HTML terms (tag content, attributes) is configurable, but is not what you need here. I’m not sure what else you want to export.

ahh i see now. configureNesting is from @lezer/html but seems to only support nesting between open/close tags or in attributes (i suppose for javascript: etc)

that seems to suggest that there isn’t really a way to do nesting between specified characters rather than between tags?

maybe i could fork @lezer/html and patch configureNesting so that you can add a nested lang between ${ and } between html tags

i’m not really sure what the best way to accomplish what i want haha. i kinda want a lang like jsx but without react component syntax/casing.

i’m using Lit and lit-html and the lang i wanna parse is kinda like the contents of a Lit render function without the boilerplate and having to wrap the template in a html tagged template string

@michaelwarren1106 it looks like you’re making an HTML variant that’s describing the inside of a lit-html-like template string. So I wonder if instead of the HTML parser you could use the JS parser, configured with HTML testing, and somehow start the parser in the nested HTML state? The JS parser would then maybe know to switch back to JS parsing when it encounters a ${.

I have no idea if this is really possible, and I haven’t yet gotten HTML parsing nested in JS working in CM6, but I think the approach, if viable, would match the syntax you’re trying to parse more closely.

1 Like

haha that’s exactly what i’m after. i was hoping you might chime in :slight_smile:

i think maybe the right thing is to use parseMixed() and some sort of configuration to tell the parser to look for js expression-like tokens ${ and } but i haven’t found a complete example of a working parseMixed and i’m a little unsure of how to actually make it work.

i’m thinking that html might be the parent, but i suppose js could be also? i think in my case the code i want to highlight will be mostly html and a little bit of js for small looping expressions etc. so maybe for me html should be the parent. the problem i’m finding that i don’t know how to work around is that the html parser grammar doesn’t know about ${} tokens. maybe that’s what parseMixed would be useful for?