@skip does not work in first line

forgodtosave · June 12, 2023, 12:44am

Hi,

I found a bug in my current grammar.

There seems to be a problem with skipping the comment. But only if the input text starts with a comment (or any number of blank lines followed by a comment). In this case, the comment (and newline) can be parsed, but all the following input cannot.

Example:

works just fine:


[mcu] #ok  
serial: test
#serial:

error:


#works
#error
[mcu] #ok  
serial: test
#serial:

It is also strange that this also seems to be a problem in my editor with Python. At least this example input cannot be parsed with a comment in the first line (or a blank line):

# error
def name(params):
  print("test")

marijn · June 12, 2023, 5:19am

If I try this code with Python, it seems to parse fine. Could it be there’s something else about your setup that’s causing this?

forgodtosave · June 12, 2023, 8:13am

I thought so too, but I couldn’t find anything (the codemirror component). Even if I comment out all extensions (except for basicSetup), the problem persists.

forgodtosave · June 14, 2023, 9:00am

I also tried out other languages like css and there I had no problems with comments in the first line. Only my grammar and python seem to cause problems. And since both have indentation tracking in common, I commented out indentation in my grammar and also removed it in the tockenizer:

@precedence { str @cut }

@top Program { (Import | ConfigBlock)+ }

@skip { Comment newline? | AutoGenerated newline | space | blankLine } 

//valueBlock<content> { indent (content newline | valueBlock<content>)+ (dedent | eof) }
sep<content, seperator> { content (seperator content)+ }

Import { "[" ImportKeyword FilePath "]" newline }
ConfigBlock {"[" BlockType Identifier? "]" newline Body? }

Body { Option+ }
Option { Parameter ":" Value | GcodeKeyword ":" Jinja2 }

Value { value (newline | eof)  } // | valueBlock<value> }
Jinja2 { jinja2 (newline | eof) } // | valueBlock<jinja2> }

value { Pin | pins | VirtualPin | VirtualPin | Cords | Number | String | Boolean | Path | FilePath }
pins { sep<Pin, ","> }
VirtualPin { string ":" string }
Cords { sep<number, ","> }
Number { number }
String { string }
Path {  ("/"|"~/")? !str string  ("/" string)* }
FilePath {  Path "." string}

//@context trackIndent from "./tokens.js"
//@external tokens indentation from "./tokens.js" { indent, dedent }
@external tokens newlines from "./tokens.js" { newline, blankLine, eof }

@tokens {
  ImportKeyword{ "include" }
  GcodeKeyword{ $[a-zA-Z0-9_.\-]* "gcode" }
  BlockType { $[a-zA-Z0-9_]+ }
  Identifier { $[a-zA-Z0-9_.\-]+ }
  Parameter { $[a-zA-Z0-9_]+ }
  
  string { ($[ a-zA-Z0-9_\-!:]+ | "//") }
  jinja2 { $[ \ta-zA-Z0-9_.\-"'{}%=]+ }
  number { "-"? $[0-9]+ ("." $[0-9]*)? }
  Boolean { "True" | "False" } 
  Pin { ("^" | "~")? "!"? "P" $[A-Z] $[0-9]+ } 

  AutoGenerated { "#*#" ![\n\r]* }
  Comment { "#" ![\n\r]* }

  space { $[ \t\f]+ }

  @precedence { space, jinja2, string }
  @precedence { AutoGenerated, Comment }
  @precedence { number, Pin, Boolean, ImportKeyword, string, "/" }
  @precedence { ImportKeyword, BlockType }
  @precedence { GcodeKeyword, Parameter }
}

@external propSource klipperConfigHighlighting from "./highlight"

@detectDelim

Now all of a sudden comments in the first line are skipped correctly!
Any idea how the tockenizer interferes with that?

/* ref: https://github.com/lezer-parser/python/blob/main/src/tokens.js */
import { ExternalTokenizer, ContextTracker } from '@lezer/lr'

import { newline as newlineToken, eof, blankLine } from '../parser/klipperConfigParser.terms.js' // ,indent, dedent

const newline = 10,
    carriageReturn = 13,
    space = 32,
    tab = 9

function isLineBreak(ch) {
    return ch == newline || ch == carriageReturn
}

export const newlines = new ExternalTokenizer(
    (input, stack) => {
        let prev
        if (input.next < 0) {
            input.acceptToken(eof)
        } else if ((prev = input.peek(-1)) < 0 || isLineBreak(prev)) {
            while (input.next == space || input.next == tab) {
                input.advance()
            }
            if (input.next == newline || input.next == carriageReturn) input.acceptToken(blankLine, 1)
        } else if (isLineBreak(input.next)) {
            input.acceptToken(newlineToken, 1)
        }
    },
    { contextual: true }
)

/* export const indentation = new ExternalTokenizer((input, stack) => {
    let cDepth = stack.context.depth
    if (cDepth < 0) return
    let prev = input.peek(-1),
        depth
    if (prev == newline || prev == carriageReturn) {
        let depth = 0,
            chars = 0
        for (;;) {
            if (input.next == space) depth++
            else if (input.next == tab) depth += 8 - (depth % 8)
            else break
            input.advance()
            chars++
        }
        if (depth != cDepth && input.next != newline && input.next != carriageReturn) {
            if (depth < cDepth) input.acceptToken(dedent, -chars)
            else input.acceptToken(indent)
        }
    }
})

function IndentLevel(parent, depth) {
    this.parent = parent
    // -1 means this is not an actual indent level but a set of brackets
    this.depth = depth
    this.hash = (parent ? (parent.hash + parent.hash) << 8 : 0) + depth + (depth << 4)
}

const topIndent = new IndentLevel(null, 0)

function countIndent(space) {
    let depth = 0
    for (let i = 0; i < space.length; i++) depth += space.charCodeAt(i) == tab ? 8 - (depth % 8) : 1
    return depth
}

export const trackIndent = new ContextTracker({
    start: topIndent,
    reduce(context) {
        return context.depth < 0 ? context.parent : context
    },
    shift(context, term, stack, input) {
        if (term == indent) return new IndentLevel(context, countIndent(input.read(input.pos, stack.pos)))
        if (term == dedent) return context.parent
        return context
    },
    hash(context) {
        return context.hash
    },
}) */

marijn · June 14, 2023, 9:12am

I still have not been able to reproduce such an issue in the Python grammar.

forgodtosave · June 14, 2023, 9:47am

Is there any other option to test grammars with there tokenizer except from codemirror/try?
And in my grammar the problem can be resolved when disabling indentation tracking so there must be some errors in my token.js but I can find them?

marijn · June 14, 2023, 10:28am

Just a script that imports the python parser and logs a parse tree would work. Make sure you are using the latest version of @lezer/python.

forgodtosave · June 16, 2023, 10:07am

I followed your advice and tried a different test environment. I finally decided on this one because you can switch between the languages quickly and easily.
Locally I added my language, python as well as pythonEdit. pythonEdit is basically python just from the files from here and here instead of importing it directly.
It is now noticeable that the standard python has no problems but python from the files cannot handle a comment in the first line.
(edited test environment)

marijn · June 16, 2023, 10:52am

If the ‘standard’ python is the @lezer/python package … that is built from those files, and has no new patches since the 1.1.7 release. So I don’t really see how that makes a difference.

forgodtosave · June 16, 2023, 11:39am

exactly that’s the strange thing. But you can try if you want, I can’t find the difference.

forgodtosave · June 19, 2023, 8:20am

Is anybody able to reproduce my problem?

forgodtosave · June 22, 2023, 8:57pm

I have found the reason for the incorrect commentary recognition. I used npx @lezer/generator in python as well as in my grammar to create the parstable. But now when I use the old parser via npx lezer-generator the comments are skipped correctly.

Have I made a mistake or is there a fault in the lezer generator?

marijn · June 23, 2023, 5:57am

@lezer/generator is the current package, and the one you should be using. I don’t know what’s going on here, and until you can provide a minimal (i.e. without Vue, CodeMirror, and other non-Lezer stuff) reproduction, I don’t intend to further look at this.