Custom modes: very basic questions from a beginner.


Good morning/evening,

As stated in my topic’s title — which could be, as-is, a repository for any beginner’s questions — I’m about to ask very simple and basic questions. I have a small project in mind, but I’m here to learn fishing, and today I ask for a very little pre-made fish. I did try to find the answers in CodeMirror’s documentation, but currently the choice is between the manual’s textbook-like discussion (too abstract) and the actual mode scripts (too specific). Also assumptions and prerequisites for understanding the scripts’ structure are not stated clearly, yet it’s clear that rudiments of language theory might prove useful.

Could anybody give me a hint (or just a suggested reading!) for the following goals:

  • (main) Defining a language (mode) which recognizes expressions like: ((string1 OR string2) AND string3) OR string4, (string1 AND string2) OR string3 (hence logical AND/OR conditions envolving strings, with the required parentheses and no quotes delimiting the strings), so that the operators (OR, AND), the parentheses and the strings are custom-styled, for example: AND, OR in blue, the strings in red, the parentheses in light gray.

  • (less important - I don’t know whether CodeMirror can be asked for parse trees) Parsing/understanding the expression, so that its logical content is understood by my program and turned into an useful evaluation for a search filter.

Thank you for your attention and for any hint/help I will receive.


Modes are basically tokenizers, so you’ll want to recognize parentheses and commas, skip whitespace, and categorize identifiers and keywords. Modes don’t, as a rule, produce syntax trees, though some of them do a moderately serious parse in order to recognize context-dependent token roles (like type versus variable).