Migrate getTokenAt(pos) from CM v5 to v6

jerondeepak · August 1, 2025, 5:51pm

I have used editor.getTokenAt(editor.getCursor()) for CM v5.

while migrate I have used below code for getToken data

getTokenAt(pos) {
const offset = pos.absolute ? pos.absolute : this.getPointsToAbsPos(pos);
const { node: nodeEl } = this.domAtPos(offset);
const { parentElement, data } = nodeEl;
const parentEl = hasClass(parentElement, ‘cm-line’) ? nodeEl : parentElement; // No I18N
const { cmView: { posAtEnd, posAtStart, mark } } = parentEl;
let markClass = null;
if(mark) {
markClass = mark.class;
}

    return {
        start: posAtStart,
        end: posAtEnd,
        string: data,
        type: markClass
    };
}

But above code have some minor invalid data for unknown token

CM v5 - output
{start: 53, end: 59, string: ‘jjasdf’, type: null}

CM v6 - output
{start: 51, end: 60, string: ’ jlskadfj’, type: null}

In CM v6 added extra space. While using “syntaxTree(state).resolveInner(pos, -1)” it returns full string of line for unknown token

Let me know is there any way to resolve it ?

marijn · August 1, 2025, 7:41pm

Trying to get tokens via the DOM sounds like a very bad idea. There’s just no solid correspondence between DOM structure and tokens.

If there’s a token around or before pos in the tree, that should be what resolveInner returns (the innermost syntax node at the given position).

jerondeepak · August 2, 2025, 7:00am

I have changed like below. I think below code becomes work. Is syntaxTree(state) didn’t call entire document string for highlighting right ?

getTokenAt(pos) {
const offset = pos.absolute ? pos.absolute : this.getPointsToAbsPos(pos);
const tree = syntaxTree(this.state);
const node = tree.resolveInner(offset, -1);

    // If we have a proper syntax node with a specific type
    if (node.name && node.name !== 'Document' && node.from !== node.to) {
        const tokenText = this.state.doc.sliceString(node.from, node.to);
        return {
            start: node.from,
            end: node.to,
            string: tokenText,
            type: node.name
        };
    }
    
    // For unknown tokens or plain text, find word boundaries manually
    const line = this.state.doc.lineAt(offset);
    const lineText = line.text;
    const posInLine = offset - line.from;
    
    // Find word boundaries - similar to CM v5 behavior
    let start = posInLine;
    let end = posInLine;
    
    // Move start backwards to find word start
    while (start > 0 && /\w/.test(lineText[start - 1])) {
        start--;
    }
    
    // Move end forwards to find word end  
    while (end < lineText.length && /\w/.test(lineText[end])) {
        end++;
    }
    
    // If we're not in a word, try to get the character at cursor
    if (start === end) {
        if (posInLine < lineText.length) {
            end = posInLine + 1;
            start = posInLine;
        } else if (posInLine > 0) {
            start = posInLine - 1;
            end = posInLine;
        }
    }
    
    const tokenText = lineText.slice(start, end);
    
    return {
        start: line.from + start,
        end: line.from + end,
        string: tokenText,
        type: null
    };
}