Creating a mixed parser with SQL inside python strings

Hi,
I’m trying to follow the examples in CodeMirror Mixed-Language Parsing Example to create a mixed python and SQL language support. My end goal is to use this in Jupyter notebook in the following way:

some_query_str = "SELECT * FROM table where a > 1"

In this example, this is a python line which should be highlighted as python, but the content inside the string quotation is an SQL query which should be highlighted with SQL.

Could you provide any guidance?

I took the html + javascript example which seems very very similar to what I need, however I face an issue:

When switching from javascript inside html to javascript inside python the highlighting of the javascript code stops working, treated as a python string. (I changed the check to be node.name == “String”), with an example of:

some_query_str = "let a = 5"

Even though there is a “String” node, it stays colored in red as a string and not changed to SQL

What have you tried so far? Can you reproduce in a sandbox?

Here is an example that mixes yaml with JavaScript that should be similar.

The SQL package does expose a number of different dialects, so it’s potentially an issue with how you are configuring the parser and the language support that is required for highlighting.

@NickTomlin
Check out this example:

I switched to JS inside python, and change the searched node to be “String”

The string is shown as red without JS highlighting

Ah, thank you for that Playground example, very interesting!

I did some more playing around and it looks like the mixed parsing is parsing the string as JS but autocomplete and highlighting is disabled.

If I use a multiline string autocomplete will work for JS, but, the syntax highlighting is not working.

I wonder if this is because it’s not enough to simply replace the String node, perhaps it’s worth looking into using overlay nesting to try and replace within the String node itself :thinking:

Yeah, the only thing that worked was using overlays, like in the example here.

Super hard to understand / debug this with the current docs :slight_smile:

Glad you figured it out! Yes, it’s a bit tricky, I’ve also found myself cross-referencing a lot of different docs, examples, and threads here to solve for my own use case.

If you don’t mind, would you be able to share an example playground, snippet, or repo for future travelers (one of which may be myself :smile:)?

What you were doing was parsing the entire Python string, including the quotes, as JavaScript. That will give you string highlighting.

What you were doing was parsing the entire Python string, including the quotes, as JavaScript

Could you provide some more details on the correct approach here? The way this is written, It sounds like what the OP (and I to some extent) were trying to do.

Here’s an somewhat isolated playground that does the following:

const wrap = parseMixed((node, input) => {
  return activateOnNodes.has(node.name) ? {
      parser: javascriptLanguage.parser,
    } : null
});

const mixedParserLanguage = pythonLanguage.configure({
  wrap,
});

The result is a parse tree that looks like this:

Script
  AssignStatement
    VariableName
    AssignOp
    Script
      ExpressionStatement
        String

Which does seem backwards (I’d expect this to be String > Script > etc). Is that where ranges in overlay comes into play?

If you don’t want the quotes to be part of the inner language’s document, use an overlay.

Non-overlay nested parses replace the entire node with the parse tree. The Script(ExpressionStatement(String)) is the JavaScript parse tree. Nothing is backwards.

:man_facepalming: Ah! I think I was getting confused by the fact that both Python and JavaScript share @top tokens named Script and String; this makes sense now :smile: .

I think I’m still struggling with what the appropriate way to use overlay in situations like this is with a token like String.

const wrap = parseMixed((node, input) => {
  if (!activateOnNodes.has(node.name)) { return null }
  if (input) {
    console.log('name', node.type.isTop, node.name, 'input:', input.string.slice(node.from + 1, node.to -1))
  }  
  return {
      parser: javascriptLanguage.parser,
      // naive way of trying to overlay just the JS not the wrapping `"`
      // "const x = 1"
      // ==>
      // const x = 1
      overlay: {
        from: node.from + 1,
        to: node.to - 1
      }
  } 
});

The console.log outputs the correct “slice” of text:

name String input: const x = 1

sandbox

Is anything going wrong with the thing you’re doing in that code?

:sweat_smile: yes :sweat_smile:

I adapted this to match the original linked post and take into account the closing " and it works.

I’ve noticed that I need to use the readonly {from: number, to: number}[] of overlay

E.g.

const overlay =  [{
  from,
  to: node.to - (closingQuote ? 1 : 0)
}]

works, but the same information as an object does not

const overlay = {
  from,
  to: node.to - (closingQuote ? 1 : 0)
 }

Here’s a “full” example that wholesale copies the code that is much more resilient to different forms of String.

In general, overlays are tricky because I haven’t found a way to get information out about what is wrong, is there diagnostic information that can be gotten out of the tree that I can use to help highlight issues like the aforementioned range issue?

@marijn Thanks, I got it working with the overlays method.

I’m using this inside JupyterLab, and it seems that everything is pretty much highlighted with the same “keyword” color. Can I customize it so builtins are highlighted differently? Currently from the DOM it seems that builtins / types are not getting any class names. I tried using styleTags but with no success

For reference my mixed language looks like this:

const myMixedPythonLRLanguage = LRLanguage.define({
parser: mixedParser.configure({
props: [
styleTags({
Type: t.number,
Identifier: t.number
})
]
})
})

No, ‘builtins’ aren’t parsed different from regular identifiers, so you can’t highlight them differently.

Basically im trying to override this the “styleTags” that’s defined in lang-sql. When I edit the source code I can change the highlighting for Type for example, im just trying to override it from the outside without changing the source code

language.configure({props: [styleTags(...)]}) should return a new language with the given tags added.

This is exactly what im doing but it’s not working. Could it be because im trying to override an existing styletags? or because the tags im changing are inside the inner language?

From what I see I need to configure the inner language of the sql-lang, but since it’s readonly and the constructor is private im not sure i can access it. Any workaround?