Need help with grammar

Hello,

I’m trying to learn Lezer by doing a custom language where i try to match expressions inside parenthesis “(EXPRESSIONS)” and outside parenthesis only “#VARIABLE#”.

Below is my attempt but for some reasons i can’t make it ignore expressions outside parenthesis.

Any help would be greatly appreciated.

My test string:

(“str1” & “str2” min(1.2, 1, 5)) #variable# another string outside 123456

Output:

image

Grammar file:

@top Program { (Identifier | Statement) }

Expression {
  Identifier
  | String
  | FunctionCall
  | Statement
}

Statement {
  OpeningBracket Expression+ ClosingBracket
}

FunctionCall {
  FunctionName OpeningBracket Expression* ClosingBracket
}

@tokens {
    Identifier { "#" $[a-zA-Z0-9\.]+ "#" }
    FunctionName { $[a-zA-Z]+ }
    String { '"' (!["\\] | "\\" _)* '"' }
    Space { @whitespace+ }
    OpeningBracket[closedBy=ClosingBracket] { "(" }
    ClosingBracket[openedBy=OpeningBracket] { ")" }
}

@skip { Space }

@precedence {
  FunctionName, Identifier
}

This states that a program is a single identifier or statement, which doesn’t seem to match your examples.

But even if you add a + after the closing parenthesis, you’re saying that the only thing outside of bracketed statements is ##-style identifiers. Which your examples also contradict. The parser will not automatically ignore stuff that doesn’t match its grammar. It will try to parse it as far as possible in a way that conforms to the grammar. So in this case it’ll probably parse words outside of brackets as FunctionName, assuming the brackets are missing in the input.

I updated my grammar to use + at the end of Program.

Yes, you are right. Outside of parenthesis it matches as FunctionName.

Is there any way to highlight expressions only inside () and outside ## identifiers even it they appears multiple times and ignore the rest?
(“str1” & “str2” min(1.2, 1, 5)) #variable# another string outside 123456 #variable#
( MATCH ) #MATCH# IGNORE IGNORE #MATCH#

Yes, but you’ll have to set up your grammar to match that content somehow, probably using some specific neutral token type.