Filtering input characters

For legacy reasons, our OT algorithm doesn’t support characters outside the Basic Multilingual Plane (BMP). We plan to fix this limitation in the future, but this would require significant work. For now, we would like to migrate across our existing solution which replaces such characters with another dummy character.

It seems like a transactionFilter is the right tool for this and I’ve attempted it in this codesandbox. The replacement works as expected but it inserts in reverse and I’m not sure why.

Could you offer any advice on getting this working? Thanks!

This approach where you replace all local transactions with a fresh one that has similar changes will strip all effects and annotations from the transactions, so that is probably not going to work (it’ll mess up the undo history, for example).

But I think the problem here is that the changes are all interpreted in the original document’s coordinate system. A filter that adds the the existing transaction by appending a change that deletes the astral characters, using sequential so that its coordinates are interpreted in the in-between coordinate system, might work (return [tr, {changes: ..., sequential: true}]).

1 Like

I wanted to remove specific input characters as well in v6, but based off of a regex. Using the above approach, you can map the output of a RegExp matchAll to ranges to reverse those changes of the transaction.

/**Filters out anything matched by filterRegex, creating new transactions that reverse the changes*/
function inputFilterExtension(filterRegex: RegExp): Extension {
  return EditorState.transactionFilter.of((tr) => {
    // Pass through transactions not involving changes
    if (!tr.docChanged) {
      return tr;
    }
    const matches = Array.from(tr.newDoc.toString().matchAll(filterRegex));
    if (!matches.length) {
      // Fine, no invalids
      return tr;
    }

    // Remove invalids ranges, suggested handling from
    // https://discuss.codemirror.net/t/filtering-input-characters/3968/2
    return [
      tr,
      ...matches.map(m => ({
          changes: {
            from: m.index!,
            to: m.index! + m[0].length,
          },
          sequential: true
      }))
    ];
  })
}
const inputFilterRegex = /[^a-zA-Z0-9\-\.]+/g;
inputFilterExtension(inputFilterRegex); // Pass to EditorState extensions

There are perhaps some optimizations here, but I have not explored those yet. My use case has an input that is never more than one line and never really more than 50 characters normally.