1

你将如何实现一个可以导入文件并仍然使用 LARK 解析它的语法?

费:

@import file.txt
.....
4

3 回答 3

1

我找到了一个看起来相关的 GitHub,这就是您要找的吗? https://github.com/lark-parser/lark

from lark import Lark
with open('file_to_read.txt', 'r') as file:
    data = file.read().replace('\n', '') #assumes you want to remove \n
l = Lark('''start: WORD "," WORD "!"
            %import common.WORD   // imports from terminal library
            %ignore " "           // Disregard spaces in text
         ''')

print( l.parse("Hello, World!") )
print( l.parse(data) )

如果你想打开文件并把它当作云雀

from lark import Lark
with open('file_to_read.txt', 'r') as file:
    data = file.read().replace('\n', '') #assumes you want to remove \n
l = Lark(data)

print( l.parse("Hello, World!") )
print( l.parse("your string to parse") )
于 2019-11-09T22:39:24.213 回答
1

此链接上的代码将在 Lark 中包含/导入。这不是我写的,只是转发。

它仍然需要对错误处理进行一些调整,但这是一个很好的起点。

下面是我对它的轻微修改,它实际上是从文件中读取的。

import sys

import lark

class RecursiveLexerThread:
    def __init__(self, lexer: lark.lexer.Lexer, text: str):
        self.lexer = lexer
        self.state_stack = [lark.lexer.LexerState(text)]

    def lex(self, parser_state):
        while self.state_stack:
            lexer_state = self.state_stack[-1]
            lex = self.lexer.lex(lexer_state, parser_state)
            try:
                token = next(lex)
            except StopIteration:
                self.state_stack.pop()  # We are done with this file
            except lark.exceptions.UnexpectedCharacters as err:
                sys.exit(err)
            except lark.exceptions.UnexpectedToken as err:
                sys.exit(err)
            else:
                if token.type == "_INCLUDE":
                    name = token.value.split()[-1]  # get just the string
                    name = name[1:-1]  # Remove "
                    self.state_stack.append(lark.lexer.LexerState(open(name).read()))
                yield token  # The parser still expects this token either way

grammar = r"""
start: ((statement _EOL)|(_INCLUDE))+

statement: NAME "=" value -> assignment

!?value: (value ("+"|"-"))? mul
!?mul: (mul ("*"|"/"))? atom

?atom: NAME -> variable
     | NUMBER -> number
     | "(" value ")"

_INCLUDE.1: "INCLUDE" /\s+/ STRING _EOL
     
_EOL : /\n+/

%import common.CNAME -> NAME
%import common.SIGNED_INT -> NUMBER
%import common.ESCAPED_STRING -> STRING
%ignore /[ \t]+/
"""

def main():

    parser = lark.Lark(grammar,
                       _plugins={
                           "LexerThread": RecursiveLexerThread
                       }, 
                       parser="lalr",
                       lexer_callbacks=None
                       )

    top=open('top.expr').read()

    try:
        tree = parser.parse(top)
    except lark.exceptions.UnexpectedCharacters as err:
        sys.exit(err)
    except lark.exceptions.UnexpectedToken as err:
        sys.exit(err)

    print(tree.pretty())

main()
于 2022-02-09T03:48:03.037 回答
0

我刚刚发现我可以使用 C/C++ 预处理器生成一个文件,然后我可以解析:)

它没有集成,但可以使其工作

cpp -P included.inc > output.file
于 2019-11-10T18:32:56.373 回答