Simple Mode and anchored regexp

Hi,

I started to define a CM mode using the mode/simple addon (as my target language don’t need really complex things), but I found that only the end-of-line regexp anchor is taken in account, and I couldn’t handle some tokens anchored to the start of lines.

Looking at the mode/simple code, the prototype of the toRegexp() function suggests this could be done, so could you tell me how I can handle these?

Thanks
Dominique

Hi Dominique,

Thanks to JS’s braindead regular expression API, there’s no sane way to test whether a regexp matches the middle of a string starting at a specific character. So what the simple mode addon does is add a ^ to the start of the regexps it is given, and match them against a substring of the line. This has the effect of hijacking the start-of-string boundary matcher, which indeed you can’t use to mean start-of-line.

What kind of syntax are you trying to match? Maybe there is another way.

Best,
Marijn

I was about to write a parser for a wiki-markup syntax editor where some “language” elements are only meaningful a line starts (ie. titles are defined with leading ! ).

I found a way to solve the issue implementing a full featured mode, which would always remain less clever than what you provided with the “simple mode”, but could give a hint on how to integrate the feature in it:

In the same mood as you in the token() function, I’m looping over an array of rules, but some of mine have an extra “sol” boolean parameter, telling if the rule should apply at line start or not. The regexp matching condition is modified accordingly.

Original code from 4.8 release (simple.js around line 118):

...
var rule = curState[i];
var matches = stream.match(rule.regex);
if (matches) {
    ...

The kind of thing I did applied to the previous code:

...
var rule = curState[i];
var sol = ! rule.sol || stream.sol();
var matches = stream.match(rule.regex);
if (matches && sol) {
    ...

Regards,
Dominique

1 Like

I could add such a feature to the simple mode framework (or you could submit a pull request that does – it’ll be quite trivial). Would that be helpful?

Hi Marijn, enabling start-of-line recognition for simple mode would be very useful to me. If you could find the time to add this feature, it would be much appreciated.

1 Like

Sure. Done in patch 2f162ee.

Hello

I’m trying to write a mode using simplemode.
It basically matches numbers with units…

123 meter
123lmeter
e

I’m trying to figure out if there is a way to have the tokens only match on word boundaries?

Currently it looks like:
Screen Shot 2020-01-17 at 8.54.11 PM

const numRx = /[-+]?(?:\.\d+|\d+\.?\d*)(?:e[-+]?\d+)?/;
const units = [‘meter’, ‘e’].join(’|’);
const numUnitPattern = (${numRx.source})?(?:\s?)(${units});
const numUnitRx = new RegExp(numUnitPattern);