How does the java grammar do things that are invalid for me?

krux02 · October 20, 2023, 2:57pm

I am trying to implement a grammar for a custom language and therefore I try to understand the .grammar data structure.

Specifically, I do want to have keyword tokens, so I did try Lezer System Guide. But I don’t really get what that abstract is saying. It’s conflating the keyword tokens with other stuff, and when I tried to just have a keyword token, I got the following error:

  Identifier { letter identifierChar* }
  Operator { $[-.+*/=!?<>&|:%$~^]+ | @specialize<Identifier , "and"> | @specialize<Identifier, "or">  }

Unrecognized expression type in token I don’t know what that means.

So I thought maybe a copy paste solution from another language for features that are similar in my language might be a good idea. So I copy paste things from here: https://github.com/codemirror/google-modes/blob/master/src/java.grammar

but when I put kw(value)="keyword" { value !identifierChar } in my code I get Unexpected token '('.

At this point I feel stuck and I don’t know what to do. Are you able to help me here?

marijn · October 20, 2023, 3:39pm

You cannot use @specialize in a token rule. You have to use it from a nonterminal rule.

krux02 · October 20, 2023, 5:34pm

But then, how do I get keywords as tokens?

marijn · October 20, 2023, 5:59pm

Look at how the other grammars are doing it (for example the Java grammar).

krux02 · October 23, 2023, 7:41am

That is exactly what I did. The java grammar defines this kw(value)="keyword" { value !identifierChar } and then uses it to parse keywords. But when I copy paste literally the java solution in my grammer, I get the error message Unexpected token '(' (src/my.grammar 64:2) where 64 is that very line. The java grammar aparently does things, that I am not allowed to do.

krux02 · October 23, 2023, 7:42am

Here is my full grammar in case you want to reproduce the error:

@external propSource highlighting from "./highlight.js"

@top File { (QuotedIdent | CustomFuncIdent | DateTime | Boolean | Number | StringLit | Identifier |  Operator )+ }

@skip { space }

@tokens {

  space { @whitespace+ }

  quotedIdentInner {
    $[a-zA-Z0-9. _/+*()@$€£¥₽"'-]+
  }

  QuotedIdent {
    "{" quotedIdentInner "}"
  }

  @precedence { Boolean, Identifier }

  letter { $[a-zA-Z_] }
  digit { $[0-9] }
  identifierChar { letter | digit }


  Boolean { "true" | "false" }

  CustomFuncIdent { "\\" Identifier }

  Identifier { letter identifierChar* }
  //Operator { $[-.+*/=!?<>&|:%$~^]+ | @specialize<Identifier , "and"> | @specialize<Identifier, "or">  }

  Operator { $[-.+*/=!?<>&|:%$~^]+ | kw("and") | kw("or")  }
  StringLit {
    ("\"" (![\\"\n] | "\\" (![\n] | "\n"))* "\"") |
    ("\'" (![\\'\n] | "\\" (![\n] | "\n"))* "\'")
  }

  digits {
    @digit+
  }

  exponent {
    $[eE] ("-" | "+")? digits
  }

  fractional {
     "." digits
  }

  Number {
    digits fractional? exponent?
  }



  dig2 { @digit @digit }

  DateTime {
    "@" $[1-9] $[0-9] $[0-9] $[0-9] "-" dig2 "-" dig2 ("T" dig2 ":" dig2 ":" dig2 "Z")?
  }
}

kw(value)="keyword" { value !identifierChar }

marijn · October 23, 2023, 9:31am

That is clearly not a thing—the Java grammar is using the same tool that you’re using.

What I’m seeing in the Java grammar is

kw<term> { @specialize[@name={term}]<identifier, term> }

which is very different from the thing you’re doing.

krux02 · October 23, 2023, 9:35am

Thank you for your support. For now I figured a solution for the keywords. But what you say the java grammar is doing I can’t find in the java gramma. What I see is this code kw(value)="keyword" { value !identifierChar } and that code does not work in my workspace. I even copied the java grammar as is in my workspace to see if I can use it without modification. It doesn’t work either. I think we are talking about different versions of the java grammar? Did you look into the link to the java.grammar that I posted in the original message?

marijn · October 23, 2023, 9:52am

Could you tell me what Java grammar you are looking at? I am looking at this one.

krux02 · October 23, 2023, 10:08am

I have posted a link to the file I looked at in the original question, but I can post it again.

github.com

codemirror/google-modes/blob/master/src/java.grammar

include "./javadoc.grammar" as doccomment

start top {
  (whitespace Statement)+
}

skip whitespace {
  context Statement {
    for "(" Statement Expr? ";" Expr? ")" Statement |
    while CondExpr Statement |
    try Block CatchFinally |
    do Statement while CondExpr ";" |
    Conditional |
    switch CondExpr Block |
    breakCont identifier ? ";" |
    assert Expr (":" Expr)? ";" |
    return Expr? ";" |
    throw Expr ";" |
    (default | case Expr | labelIdentifier) ":" |
    import kw("static")? identifierDot* ("*" | identifier) ";" |

This file has been truncated. show original

And yes your are right, those are different grammars.

marijn · October 23, 2023, 12:59pm

Oh, duh. That is not a Lezer grammar. Don’t take any inspiration from it.