Custom modes: very basic questions from a beginner.

ChomskyHierarchy · November 5, 2018, 5:27pm

Good morning/evening,

As stated in my topic’s title — which could be, as-is, a repository for any beginner’s questions — I’m about to ask very simple and basic questions. I have a small project in mind, but I’m here to learn fishing, and today I ask for a very little pre-made fish. I did try to find the answers in CodeMirror’s documentation, but currently the choice is between the manual’s textbook-like discussion (too abstract) and the actual mode scripts (too specific). Also assumptions and prerequisites for understanding the scripts’ structure are not stated clearly, yet it’s clear that rudiments of language theory might prove useful.

Could anybody give me a hint (or just a suggested reading!) for the following goals:

(main) Defining a language (mode) which recognizes expressions like: ((string1 OR string2) AND string3) OR string4, (string1 AND string2) OR string3 (hence logical AND/OR conditions envolving strings, with the required parentheses and no quotes delimiting the strings), so that the operators (OR, AND), the parentheses and the strings are custom-styled, for example: AND, OR in blue, the strings in red, the parentheses in light gray.
(less important - I don’t know whether CodeMirror can be asked for parse trees) Parsing/understanding the expression, so that its logical content is understood by my program and turned into an useful evaluation for a search filter.

Thank you for your attention and for any hint/help I will receive.

marijn · November 13, 2018, 9:40am

Modes are basically tokenizers, so you’ll want to recognize parentheses and commas, skip whitespace, and categorize identifiers and keywords. Modes don’t, as a rule, produce syntax trees, though some of them do a moderately serious parse in order to recognize context-dependent token roles (like type versus variable).

ChomskyHierarchy · November 20, 2018, 1:44pm

Thank you very much for your information.

I’ve downloaded the complete modes folder. In your opinion, what is the mode I can best guess the basic mode skeleton code from? I’m asking because in my beginner’s sight, even though common methods can be seen in these modes, I see very different “patterns” in modes’ implementations.
By the way, if you don’t mind, I would like to ask specific questions in small steps. My first one: quoting from mode http.js, could you please comment the method shown at the bottom of this post, which is: could you comment what it does (just this method, not the methods it calls), line-by-line or in “plain english” ?

Thank you again for your help.

function start(stream, state) {
if (stream.match(/^HTTP\/\d\.\d/)) {
  state.cur = responseStatusCode;
  return "keyword";
} else if (stream.match(/^[A-Z]+/) && /[ \t]/.test(stream.peek())) {
  state.cur = requestPath;
  return "keyword";
} else {
  return failFirstLine(stream, state);
}

}

marijn · November 20, 2018, 1:47pm

Which mode to look at depends on what your target language looks like and how thoroughly you want to parse it. Parsing Markdown works very different from parsing a curly brace language, etc.

That’s a tokenizer function that distinguishes between requests and responses (which the mode both handles), tokenizing the first element and storing a state that will take care of interpreting the rest in the proper way.