Hi CodeMirror and Lezer users,
I’m trying to create a programming language that uses simple annotations to mark elements of cooking recipes. I’ve got most of it working, but am struggling to get mixed numbers parsed correctly. I’ve tried two separate approaches, and I run into different issues.
These are the different type of exact_values I’m trying to parse:
1 // Natural number
22 // (can be multiple digits)
3/4 // Fractions
8 / 15 // (can be multiple digits, can have whitespace)
2 3 / 4 // Mixed numbers (natural followed by a fraction)
Approach 1: No structure within tokens
In the current approach in my language I’ve been able to parse these using tokens, assigning the correct top-level type of numbers, but this loses the subtree structure of fractions and mixed numbers:
@precedence { mixed, fraction, natural }
Exact_value {
!mixed Mixed
| !fraction Fraction
| !natural Natural_number
}
@tokens {
// ...
Mixed { Natural_number Hwhitespace? Fraction }
Fraction { Natural_number "/" Natural_number }
Natural_number { $[1-9]$[0-9]* }
@precedence {Fraction, Mixed, Natural_number, Quantity_unit}
Hwhitespace { $[ \t]+ }
}
However, with this in place, I lose the ‘subtree’ structure of mixed numbers and fractions, e.g.
[3]
[1/2]
[2 1/2]
parses to
My first question therefore is: is it possible to retain a subtree of tokens?
Approach 2: Precedence (?) trouble
The other approach I have, is to not use ‘nested’ tokens to match the exact_values, but to define appropriate terms and rules to match the structure I’m trying to capture:
@top recipe { (Exact_value "\n")+ }
@precedence { mixed, fraction, natural }
Exact_value {
!mixed Mixed
| !fraction Fraction
| !natural Natural_number
}
Mixed { Natural_number Hwhitespace Fraction }
Fraction { Natural_number Hwhitespace? "/" Hwhitespace? Natural_number}
@tokens {
Natural_number { $[1-9]$[0-9]* }
Hwhitespace { $[ \t]+ }
"/"
}
In an isolated environment this works, correctly matches all cases shown above. However, when incorporated into the other parts of my language, the Natural_number
option is matched and the fraction labelled as an error, even though the Mixed
rule is given precedence. I have to admit that I’m quite new to Lezer, so I might’ve made some rudimentary error elsewhere.
My second question is: How to give presedence to matching the ‘longer’ option Mixed
over early matching the Natural_number
rule?
For the second question, this is the code and debug recipe used to debug/test this in a Lezer playground (https://lezer-playground.vercel.app/): NOTE: I’ve removed the optional whitespace in the Fraction rule here already.
Recipe:
# recipe
- [33] apples
- [1/2] apples
- [2 1/2] apples
Grammar:
@top recipe { block+ }
block { Paragraph | "\n" }
Paragraph {
(Inline newline_or_eof)+ newline_or_eof
}
Inline {
( Quantity
| Non_delimiter_text
)+
}
Quantity { "[" Exact_value? Hwhitespace? Quantity_unit? "]" }
@precedence { mixed, fraction, natural }
Exact_value {
!mixed Mixed
| !fraction Fraction
| !natural Natural_number
}
Mixed { Natural_number Hwhitespace Fraction }
Fraction { Natural_number "/" Natural_number}
@tokens {
Non_delimiter_text { ![\n\[\]\{\}\@\|<>]+ }
Quantity_unit { ![0-9\n\[\]\{\}\@\|<>/ \t]![0-9\n\[\]\{\}\@\|<>]* }
// Mixed { Natural_number Hwhitespace? Fraction }
// Fraction { Natural_number "/" Natural_number }
Natural_number { $[1-9]$[0-9]* }
// @precedence {Fraction, Mixed, Natural_number, Quantity_unit}
Hwhitespace { $[ \t]+ }
// Delimiting tokens to render in tree
"/"
newline_or_eof { "\n" | @eof}
}
I hope I have provided enough context and information for my questions, but I’ll gladly provide any missing information!
Kind regards,
Auke