Custom grammar with case insensitive move

Hi! I’m working on a custom grammar for a in-house language we’re using for querying data. This language doesn’t care about casing, so writing query {…}, Query {…}, qUeRy {…}, and so on doesn’t make a difference.

I have a working grammar at the moment, but unfortunately is case-sensitive (only recognising Query {…}). I’ve seen a few answers saying this can be achieve with an “external tokenizer”, but the examples linked to are — in my opinion — quite dense and opaque, hard to grasp what needs to happen.

Is it possible to explain with a simple example, please? I’d be happy to write a blog post and contribute to the documentation once I understand this and can help people out. Thanks!

I found the XML parser’s ExternalTokenizer pretty straightforward to understand (after converting it to TypeScript and making minor changes, ref). For me with Lezer (and maybe all parser generators, IDK I’ve only used Lezer) I find the best way to learn is to just do the thing. Either start out with a minimal example, or take a working piece of code and deconstruct it (with a lot of console logging).

I’ll be writing a new ExternalTokenizer for my own query language (ref) in the coming days, I can let you know how that goes if you’re interested.

1 Like

Thanks AlexErrant!

To leave an update here, I was able to do this with the following two changes:

  • I have a tokens.ts file with pretty much the following code:
    // This file gets outputted by the grammer compilation process
    import { aa, bb, cc } from './parser.terms';
    
    import type { Stack } from '@lezer/lr';
    
    const keywordMap: Record<string, number> = {
      aa,
      bb,
      cc,
    };
    
    export function keywords(value: string, _stack: Stack) {
      return keywordMap[value.toLowerCase()] ?? -1;
    }
    
  • And the following on the top of my grammar file:
    @external specialize { identifier } keywords from './tokens' {
      aa[@name=aa]
      bb[@name=bb]
      bb[@name=bb]
    }