Modifying toLowerCase function to support diacritic insensitive search causes infinite loop

MoamenAbdelsattar · October 31, 2024, 2:05pm

I’m trying to make the editor able to find and replace Arabic words in a diacritical marks insensitive way. Instead of modifying codemirror core, I decided to modify toLowerCase function like this:

    String.prototype.toLowerCase = function(){
        let x = this.replace(/\p{M}/gu, ""); // remove diacritical marks
        return x.replace(/[A-Z]/g, function(match){
            let code = match.charCodeAt(0);
            return String.fromCharCode(code + 32);
        })
    }

Adding this code leads to infinite loop whenever codemirror tries to search in a document with Arabic words. The infinite loop doesn’t happen when searching English documents. I need to add this feature without modifying codemirror core since I’m using the built browserfied version of codemirror from https://codemirror.net/codemirror.js, Any suggestions?

marijn · October 31, 2024, 2:33pm

SearchCursor provides an option specifically for this purpose. There should be no need to mess with String.toLowerCase.

MoamenAbdelsattar · October 31, 2024, 6:43pm

I’ll give it a try, but I guess it will make the same result, you know why? This nextOverlapping() from here doesn’t take into account that normalize can actually remove something from the string, making length of norm 0.

    nextOverlapping() {
    for (;;) {
      let next = this.peek()
      if (next < 0) {
        this.done = true
        return this
      }
      let str = fromCodePoint(next), start = this.bufferStart + this.bufferPos
      this.bufferPos += codePointSize(next)
      let norm = this.normalize(str)
      for (let i = 0, pos = start;; i++) {
        let code = norm.charCodeAt(i)
        let match = this.match(code, pos, this.bufferPos + this.bufferStart)
        if (i == norm.length - 1) {
          if (match) {
            this.value = match
            return this
          }
          break
        }
        if (pos == start && i < str.length && str.charCodeAt(i) == code) pos++
      }
    }

I still can’t assure that this is the source of infinite loop, since I didn’t read all the source code, I’m just guessing.

marijn · November 1, 2024, 4:24am

This patch should help with that.