python - How to parse ~{expr} inside string with lark ebnf

Question

I am trying to write a lark grammar for a dsl, but having trouble with this string interpolation syntax:

" abc " <- normal string
" xyz~{expression}abc " <- string with interpolation

so a ~{ switches from string to expression, and a } terminates that expression. I think this is close:

string : "\"" (string_interp|not_string_interp)* "\""
string_interp: "~{" expression "}"
not_string_interp: /([^~][^{])+/

But the regex will only match even numbers of characters, and if the ~{ straddles an even boundary, it will be missed.

not_string_interp: /(.?|([^~][^{])+)/

This is about as far as I could get, but still seems wrong. Can I use lookaheads? I also want to keep %ignore WS on, as it keeps the noise down massively, so a solution accounting for that would be great!

Thanks

Test cases:

""
"a"
"~{1}"
" ~{1} "
"a bc~{1}c d"
"a b~{1}c d"

score 2 · Accepted Answer

I think this does it. Sadly any ~ not followed by { will split the string up, but I can reconstruct them later. I am getting fooled by the equal precedence of rules, and the greediness of regexes.

/[^"~]+/ anything that is not ~ or " (regular string)

"~{" expression "}" the normal expression

/~(?!{)/ handle ~ without {. Use ?! because we must not consume next char (it could be " or another ~)

from lark import Lark

print (Lark(r"""
    string: "\"" string_thing* "\""
    string_thing: /[^"~]+/
        | "~{" expression "}"
        | /~(?!{)/
    expression: /[^}]+/
""", start='string', ambiguity="explicit").parse(
# '"a"'
'"a~b{}c}d~{1}g"'
# '"~abc~"'
# '"~{1}~~{1}~~~{1}"'
).pretty())

score 1 · Accepted Answer

Here is a solution to your problem using a positive lookbehind.

(?<=~{)[^}]+

It looks for the beginning of the expression ~{ and captures everything until the closing brace }

python - How to parse ~{expr} inside string with lark ebnf

2 回答 2

Related

Reference