About the selection range when double-clicking. (Intl.Segmenter)

About the selection range when double-clicking.
Not surprisingly, within CJK, the current specifications will give unnatural results.
Therefore, can use the function of Intl.Segmenter to properly extract the selection range.
Currently, I have changed the code as follows and am satisfied with it.

function groupAt(state, pos, bias = 1) {
        let line = state.doc.lineAt(pos), linePos = pos - line.from;
        if (line.length == 0)
            return EditorSelection.cursor(pos);

        var from, to;

        if ("Intl" in self && "function" === typeof Intl["Segmenter"]) {
            // "山は綺麗だ。" => "山|は|綺麗|だ|。"
            var segmenter = new Intl["Segmenter"]("ja", { "granularity": "word" });
            var segments = segmenter.segment(line["text"]);
            var matches = segments.containing(linePos);
            from = matches["index"];
            to = from + matches["segment"]["length"];
        } else {
            let categorize = state.charCategorizer(pos);

            if (linePos == 0)
                bias = 1;
            else if (linePos == line.length)
                bias = -1;
                from = to = linePos;
            if (bias < 0)
                from = findClusterBreak(line.text, linePos, false);
            else
                to = findClusterBreak(line.text, linePos);
            let cat = categorize(line.text.slice(from, to));
            while (from > 0) {
                let prev = findClusterBreak(line.text, from, false);
                if (categorize(line.text.slice(prev, from)) != cat)
                    break;
                from = prev;
            }
            while (to < line.length) {
                let next = findClusterBreak(line.text, to);
                if (categorize(line.text.slice(to, next)) != cat)
                    break;
                to = next;
            }
        }

        return EditorSelection.range(from + line.from, to + line.from);
    }

Because it is a part that can not be changed using plug-ins etc.

  • Whether to use Intl.Segmenter?
  • Which language code should you use? (ja etc)

I think it would be convenient if there is a way to specify this as an option when creating an instance, so I will propose it.

According to can i use, Intl.Segmenter seems to be supported by major browsers.

MDN has a playground.

I see your point, but character categories are integrated with features like language-specific word characters (your patch would break, for example, selecting $name-style variables in languages that have them) and by-group cursor motion. Also, as long as segmenting is non-standard and not available on Firefox, I don’t really want to redesign these features around it.

1 Like

Hello. Thank you for your reply.
Your claim is reasonable, so I’ll try to change it locally.

An event handler that implements custom behavior for double clicks shouldn’t be too difficult to put together.

1 Like