StringStream.match, token is not the matched string

fy0 · March 24, 2022, 12:07pm

I have a text like this:

AAAAAA [image, http://xxx.com/a.png] BBBB

I want get parse result like this:

AAAAAA [image, http://xxx.com/a.png] BBBB
↑ text ↑ image                       ↑ text

I wrote a stream parser like this:

const language = StreamLanguage.define<{}>({
  startState() {
    return {}
  },

  token(stream, state) {
    let m = stream.match(/\[image,[^\]]+\]/) as RegExpMatchArray
    if (m) {
      return `image`
    }

    stream.next()
    return `text`
  }
})

Then I got this:

AAAAAA [image, http://xxx.com/a.png] BBBB
↑ image                 ↑ text

// image token is "AAAAAA [image, http://xxx.co"
// stream.string is "AAAAAA [image, http://xxx.com/a.png]"
// m[0](matched string) is "[image, http://xxx.com/a.png]"

I found that, when stream.match matched, i will got a token from pos 0 to length of the matched string. Why?
I had used serval parse system, it’s first time i see one works in this way.

marijn · March 24, 2022, 12:33pm

Put a ^ at the start of your regexp to make sure it only matches the token at the start of the stream.

fy0 · March 24, 2022, 12:48pm

Thanks, but the token not in the start of the line.
Working on a polyfill to the stream, and close to success.

I still don’t know why the stream is so different than others, including can’t read a ‘\n’ from the stream.

marijn · March 24, 2022, 1:00pm

That doesn’t matter, the docs for StringStream.match tell you to always anchor it with a ^.

fy0 · March 24, 2022, 1:03pm

The right solution:

Put a ^ at the start of regex

My old solution(do not use it, just for record):

store text in state, and clear state.text when eol:

      state.text += stream.next()

      if (stream.eol()) {
        state.text = ''
      }

replace stream.match code with regexp.exec:

          // let m = stream.match(/\[image,[^\]]+\]/g) as RegExpMatchArray
          let m = /\[image,[^\]]+\]/g.exec(state.text) as RegExpMatchArray
          if (m) {
            state.nextN = [`image-${state.name}`]
            stream.start -= m[0].length
            state.text = ''
            return `image`
          }

Now it works:

fy0 · March 24, 2022, 1:05pm

OK, I will try it.

fy0 · March 24, 2022, 1:07pm

I’m sorry, you are right, it’s a better solution.
The last few days with lezer have not been very pleasant, it made me a little grumpy, hope you don’t mind.