Overriding Bidi spans in RTL context

Hello!
I’m working on a math editor that supports RTL languages, and am facing a bidi-related problem which I’m not sure how to solve:
The editor’s general text direction is RTL, but I’d like the math segments to have LTR direction (as it’s written). However, CodeMirror treat the math text as any other text:
without bidi-override

(text between the dollar signs is treated as math; as you can see there’s some syntax highlighting for it, but I don’t think this affects the behavior)

Setting unicode-bidi: bidi-override corrects the displayed text, but the cursor jitters oddly:
with bidi-override

selection and editing also behaves oddly at parts. For example, when I try to select all text in the first math segment it ends up this way:
image

looking at the line’s bidiSpans(), it seems the editor is “RTL greedy” here, in the sense that it tags any text segment that’s not clearly LTR as RTL (which explains the above behavior, I think):
image
To be clear, these logs show the from and to of each bidi span, the value at that range and the span’s direction.

I’d like to be able to tell the editor to treat these segments as pure LTR text, but have not been able to find much relevant information in the docs or here in the forum. How can I go about it?

As an aside, CodeMirror is a wonder, and I really appreciate all the hard work that you’ve put into it.

Thanks!

Because the browser does not give scripts access to the order in which it is drawing the text, CodeMirror contains its own implementation of the bidi algorithm. However, this works just from the document text, and does not take the styling of that text with CSS into account. That is why you are seeing the weird behavior with this styling.

So as things stand, affecting bidi ordering with decorations is unfortunately not supported. You can of course replace parts of the code with a widget and set explicit ordering inside of it, but it doesn’t sound like that is what you need in this case.

I could be convinced to try and find a way to add support for this kind of thing, but since that’s going to be a non-trivial undertaking, it would need to involve payment for my time.

I see, that’s a bummer.

I would be willing to pay some amount for a solution to be implemented (have been considering donating anyways), but I’m a hobbyist individual and am not sure I could afford the amounts you would be expecting.

Can we continue this discussion to define/plan what a solution might look like? I’m by no means an expert, but if we have a clear idea of what a solution might look like I could try to implement it.

For example, is it feasible to implement an access point “between” the bidi algorithm and the final bidiSpans value, that a suitable extension can modify?
That is, a user could provide a function that takes a BidiSpan array and returns a modified array (or something similar), which is then used by the internal logic for all bidi-related intents and purposes in place of the spans determined by the algorithm.
If that’s added to CodeMirror, I think writing an extension that forces the math ranges to be LTR should not be too difficult.

This is JavaScript, so you could monkey-patch EditorView.prototype.bidiSpans to replace it with your own method, possibly calling through to the original method. But that doesn’t really seem viable without access to a bunch of library-internals and some very kludgy code. But it’d roughly give you the access point you describe.

From what I can see this far, I think a proper solution to this would add a feature to @codemirror/view allowing extensions to define bidi embedding and isolation using range sets (the data structure also used for decorations), and to extend the bidi algorithm implementation to support embedding and isolation for the ranges provided in that way. (Reading this from the computed CSS styles seems like it would get very expensive and annoying to implement, which is why I’d go with a system where extensions provide this information directly.)

By the way, does your use case require bidi-override? From what I understand (which, to be fair, isn’t all that much since I don’t use a right-to-left script), it seems like unicode-bidi: isolate; direction: ltr would also work for this.

bidi-override is by not required in my use case, but unicode-bidi: isolate seems to give the exact same behavior (which makes total sense if my understanding is correct, as CodeMirror handles the bidi-related logic on its own and doesn’t take CSS into account).

I tried the monkey-patch solution you mentioned, and have made it work for the most part. It involves some dirty code, and is by no means a good solution, but it can be found here if it’s of any help to anyone.

The proper solution you mentioned sounds great.

If it’s not too big of a breaking change, do you think separating the bidi logic from the EditorView entirely could be a good base step?
I mean something like CodeMirror’s philosophy regarding features such as syntax highlighting - there’s none by default, but it can be provided through extensions (whether those are provided by the core library or third-party ones).

This would move the problem entirely to extension-land, and allow users to choose between the using the current implementation or hacking out a new one, suitable for their needs.
Of course, a complete solution could then be implemented as an core extension, whether by a dedicated developer or if someone’s able to sponsor your doing it.

I don’t think that’s the right approach. Firstly, people will omit it from their configuration because they don’t know they need it, and make the editor unusable for RTL language users. Secondly, we’ll need a solution where multiple extensions that all need to declare some pieces of the text as bidi isolating can work together, rather than a single access point that can be overridden.

I’ll put this on my list. Will probably work on it eventually.

I’ve made a donation as thanks for your amazing work on CM; It’s not too much, but if it allows you some time - I’d love if it went to this problem (this is not an expectation, of course, but a wish).

This set of changes introduces an EditorView.bidiIsolatedRanges facet, that can be used to inform the editor’s bidi algorithm to treat ranges in a given range set as isolating (using the bidiIsolate property on the decoration specs to determine the direction of the isolates).

I’ve set up an example of how to use it but can’t publish it to the website before this code is released in @codemirror/view. It looks like this:


import {EditorView, Direction, ViewPlugin, ViewUpdate,
        Decoration, DecorationSet} from "@codemirror/view"
import {Prec} from "@codemirror/state"
import {syntaxTree} from "@codemirror/language"
import {Tree} from "@lezer/common"

const htmlIsolates = ViewPlugin.fromClass(class {
  isolates: DecorationSet
  tree: Tree

  constructor(view: EditorView) {
    this.isolates = computeIsolates(view)
    this.tree = syntaxTree(view.state)
  }

  update(update: ViewUpdate) {
    if (update.docChanged || update.viewportChanged ||
        syntaxTree(update.state) != this.tree) {
      this.isolates = computeIsolates(update.view)
      this.tree = syntaxTree(update.state)
    }
  }
}, {
  provide: plugin => {
    function access(view: EditorView) {
      return view.plugin(plugin)?.isolates ?? Decoration.none
    }
    return Prec.lowest([EditorView.decorations.of(access),
                        EditorView.bidiIsolatedRanges.of(access)])
  }
})

import {RangeSetBuilder} from "@codemirror/state"

const isolate = Decoration.mark({
  attributes: {style: "direction: ltr; unicode-bidi: isolate"},
  bidiIsolate: Direction.LTR
})

function computeIsolates(view: EditorView) {
  let set = new RangeSetBuilder<Decoration>()
  for (let {from, to} of view.visibleRanges) {
    syntaxTree(view.state).iterate({
      from, to,
      enter(node) {
        if (node.name == "OpenTag" || node.name == "CloseTag" ||
            node.name == "SelfClosingTag")
          set.add(node.from, node.to, isolate)
      }
    })
  }
  return set.finish()
}

This has now been released (@codemirror/view 6.17.0) and the example is live on the website.

This looks great! thank you so much!!
The example is well written as well - I’ll be sure to try it as soon as I can.

This is great! I’ve been able to adapt the code from the example to our use. I am facing some odd issues when the cursor is at the start of an LTR isolated span within a surrounding RTL context; it seems to get visually positioned at the LTR span’s end, but still work as if it was at the start.

I’ll need to investigate this some more to get a minimal reproduction together.

This sounds like a general phenomenon with direction changes—there being points where RTL text and LTR text inserted will appear in (visually) different places, so regardless of where the cursor is, some text will appear in a surprising-looking spot. But maybe what you’re describing is something beyond that?

It’s something beyond that; as I said, I think I’ll need to put together a minimal repro, as the editor is embedded in a lot of code that’s not relevant, and requires a (free) Firefox Account to use.

In a simpler case, it looks like this in Firefox when first tapping right-arrow multiple times from the end, and then left-arrow: https://youtu.be/2K0cEDrsCNY. In Chrome, going right does not produce the jump as in Firefox.

The DOM generated for the line is correct (with a parent element setting dir="rtl"):

<div class="cm-line">
  <span spellcheck="true">
    <span class="ͼ9">למידע נוסף, ניתן לבקר בתמיכה של </span></span>
  <span dir="ltr" style="unicode-bidi: isolate">
    <span class="ͼ6">{</span>
    <span class="ͼ7"> -brand-mozilla </span>
    <span class="ͼ6">}</span></span>
  <span spellcheck="true"><span class="ͼ9">:</span></span>
</div>

So while the above might be a Firefox bug, with more complex cases it can get clearly broken; here too tapping right-arrow and then left-arrow until back at the end: https://youtu.be/X-4f2vCY0Y4. This behaves the same in Chrome.

The DOM for that is correct (with a parent element setting dir="rtl"):

<div class="cm-line">
  <span spellcheck="true">
    <span class="ͼ9">למידע נוסף, ניתן לבקר ב</span></span>
  <span dir="ltr" style="unicode-bidi: isolate">
    <span class="ͼ5">&lt;</span>
    <span class="ͼ5">a data-l10n-name=</span>
    <span class="ͼ8">"</span>
    <span dir="auto" style="unicode-bidi: isolate">
      <span class="ͼ8">supportLink</span></span>
    <span class="ͼ8">"</span>
    <span class="ͼ5">&gt;</span></span>
  <span spellcheck="true"><span class="ͼ9">תמיכה של </span></span>
  <span dir="ltr" style="unicode-bidi: isolate"></span>
    <span class="ͼ6">{</span>
    <span class="ͼ7"> -brand-mozilla </span>
    <span class="ͼ6">}</span></span>
  <span dir="ltr" style="unicode-bidi: isolate"></span>
    <span class="ͼ5">&lt;</span>
    <span class="ͼ5">/a</span>
    <span class="ͼ5">&gt;</span></span>
  <span spellcheck="true"><span class="ͼ9">.</span></span>
</div>

I’ve filed an issue with a reproduction of a bidi issue with dir="auto":