Examples of using an External Tokenizer

DougCube · February 27, 2026, 7:37am

I’m a very new user (to CodeMirror 6), so forgive my ignorance. Tho I do have experience with ANTLR and Flex/Bison/Yacc.

Are there plenty of examples of creating and hooking up an external tokenizer somewhere? So far, what I found are lone examples that aren’t super helpful due to lack of explaination.

More specifically, I want to get “LABEL” tokens of the form ^[A-Za-z][^\n()\[\]]+:which can contain whitespace and must start on a newline but not consume the newline character before it (if there even is one) and handle the case it’s on the first line. And “POWER” token of the form-?[1-9][0-9]?where ‘-’ is otherwise a skip token. Then the rest of my tokens can just use my current *.grammar file, with priority lower ofc.

marijn · February 27, 2026, 7:45am

Most of the parsers in the Lezer Github org use an external tokenizer of some kind. The indentation tokenizer for Python also needs to ensure it is directly after a newline, and may be a useful example.

DougCube · February 28, 2026, 7:51pm

I ended up searching GitHub for examples and there were plenty.

I got it working. Turned out to be simple.