How Does CodeMirror Calculate Line Counts So Quickly?

Offroaders123 · February 19, 2021, 1:09am

Hello,

I have been working on a simple project that adds line numbers beside a standard HTML textarea when the user changes it’s value (Feel free to see the source code on GitHub here if you have any questions on how it is implemented). I have tried multiple ways to calculate how many new line characters are present within the textarea’s value, most being from this post on StackOverflow. However, they don’t have the best performance when the textarea has a value with a few thousand lines present. CodeMirror manages to update and calculate the new line numbers almost instantly, no matter how many lines are pasted or how many are present, and I can’t figure out how they managed to do this. I tried looking through the source to find where it might be calculated, but I’m not sure where to look for such a feature, as there are a lot of other functions to look through that aren’t related to what I am looking for.

If you know how CodeMirror implements counting the new line characters within the textarea value without much of a dip in performance, or where I can look to find it, that would be a great help!

Also, to see the issue I am having with my project, feel free to see it in use in my PWA, Smart Text Editor. Pasting 1000+ lines into the editor at once usually causes the lag to occur, especially when the editor already has more than a few thousand lines in it before changing the value.

*Edit: Make sure to change the text editor’s view type to Code or Split in order to enable line numbers. The view can be changed in the View dropdown menu in the site header.

Thanks!
Brandon Bennett

marijn · February 19, 2021, 8:09am

CodeMirror keeps its document in a data structure that has the lines already split into separate strings, and forms a tree, with each node storing its line count. When you paste a large amount of lines, it still has to initially split the new content into lines and create the tree structure for them, but that tends to be fast. Any further queries that need to count lines or access a specific line number can go through the balanced tree, and thus have O(log(docSize)) complexity.

Offroaders123 · February 19, 2021, 10:17am

Ok, that makes sense. I’ll see if I can make something similar to that, as it definitely performs much better than the current method I am using.

Thanks for the explanation and inspiration

cben · February 23, 2021, 9:36pm

See CodeMirror: B-Tree visualization demo for visualization of the data structure.

Offroaders123 · February 24, 2021, 7:59pm

Ok, that helps a lot! So the data structure used to get the formatting information about the text are the visible HTML elements the text was already broken up into by the CodeMirror parser?