我正在尝试解析我正在编写的一些伪代码,并且在查找实际作为字符串提供的表达式格式时遇到了一些麻烦。
我成功地让它与正则表达式和标记器方法一起工作,但字符串将来会更多,我不想通过正则表达式传递所有内容或向标记器添加更多代码,而是我想使用lark
. 我是百灵鸟的初学者,所以很难解析以下字符串。
'filter:property-insensitive:name:%Name'
'filter:property-insensitive:gender:M:F:U'
'filter:age:on:${today}:Equal:26:years'
'filter:date:Equal:${today}'
'filter:property:Type:regular:temp'
'filter:property:Status:active:unknown'
from lark import Lark
parser = Lark(r"""
start: greet greet greet
greet: "filter"
| ":" -> seperator
| "property-insensitive" -> key_case
| ":" -> seperator
""", parser="lalr")
sample_conf = """filter:property-insensitive:Surname:%Name"""
print(parser.parse(sample_conf).pretty())
尝试2:
import lark
grammar = r'''start: instruction
?instruction: filters_function
FUNCNAME: (LETTER+) (LETTER+|DIGIT+|"_")* // no parentheses allowed in the func name
PROPERTY: (LETTER+) (LETTER+|DIGIT+|"-")*
FIELD: (LETTER+) (LETTER+|DIGIT+|"-"|"_")*
VALUE: (LETTER+) (LETTER+|DIGIT+|"-"|"%")*
ARGSEP: ":" // argument separator
WORDSEP: ":" // word separator
CONDSEP: ":" // condition separator
STAR: "*"
filters_function: FUNCNAME PROPERTY FIELD VALUE*
%import common.LETTER
%import common.WORD
%import common.DIGIT
%ignore ARGSEP
%ignore WORDSEP
'''
parser = lark.Lark(grammar, parser='earley') # lalr
print(parser.parse("filter:property-insensitive:Surname_name:Name"))
尝试3:
import lark
grammar = r'''start: instruction
?instruction: filters_function
property:"property-insensitive" -> property_insensitive
| "property" -> property
| "age" -> age_filter
| "date" -> date_filter
FUNCNAME: (LETTER+) (LETTER+|DIGIT+|"_")* // no parentheses allowed in the func name
FIELD: (LETTER+) (LETTER+|DIGIT+|"-"|"_")*
VALUE: (LETTER+) (LETTER+|DIGIT+|"-")*
ARGSEP: ":" // argument separator
WORDSEP: ":" // word separator
CONDSEP: ":" // condition separator
STAR: "*"
filters_function: FUNCNAME property FIELD VALUE *
%import common.LETTER
%import common.WORD
%import common.DIGIT
%ignore ARGSEP
%ignore WORDSEP
%ignore " "
%ignore "$"
%ignore "{"
%ignore "}"
%ignore "}"
%ignore "0".."9"
'''
parser = lark.Lark(grammar, parser='earley') # lalr
print(parser.parse("filter:property-insensitive:Surname:Name"))
print(parser.parse("filter:property-insensitive:Sex:M:F:U"))
print(parser.parse("filter:age:on:${today}:Equal:26:years"))
print(parser.parse("filter:date:Equal:${today}"))
print(parser.parse("filter:property:Registration_Type:regular:temp"))
尝试 3 - 问题:
如果属性 == 年龄 | 我不能对字段和值进行 OR 日期将其余值视为 VALUE
如果我们找到我希望字符串当前在树输出中的属性,它是 []
Tree(start, [Tree(filters_function, [Token(FUNCNAME, 'filter'), Tree(property_insensitive, []), Token(FIELD, 'Surname'), Token(VALUE, 'Name')])])
正如你所看到的,我现在忽略了几个对我的方程有意义的实际值,我需要找到一种方法来在树值中得到它。
尝试4:
import lark
grammar = r'''start: instruction
?instruction: filters_function
property: "property-insensitive" -> property_insensitive
| "property" -> property
property_date: "age" -> age_filter
| "date" -> date_filter
// To run over special characters
TEXT: (LETTER+) (LETTER+|DIGIT+|"-"|"_")*
FILTER: TEXT
SPECIAL_VALUE: "${" (LETTER+) (LETTER+|DIGIT+|"-"|"_") "}" *
WILD_CARD: "%" TEXT | TEXT | NUMBER
VALUE: WILD_CARD|SPECIAL_VALUE
ARGSEP: ":" // argument separator
STAR: "*"
filters_function: FILTER (property | property_date) VALUE *
// find the whitespace so we can ignore
WHITESPACE: (" " | "\n")+
%ignore WHITESPACE
%import common.LETTER
%import common.WORD
%import common.DIGIT
%import common.INT -> NUMBER
%ignore ARGSEP
%ignore " "
'''
parser = lark.Lark(grammar, parser='earley') # lalr
print(parser.parse("filter:property-insensitive:Surname:%Name"))
print(parser.parse("filter:property-insensitive:Sex:M:F:U"))
print(parser.parse("filter:age:on:${today}:Equal:26:years"))
print(parser.parse("filter:date:Equal:${today}"))
print(parser.parse("filter:property:Registration_Type:regular:temp"))
输出:
Tree(start, [Tree(filters_function, [Token(FILTER, 'filter'), Tree(property_insensitive, []), Token(VALUE, 'Surname'), Token(VALUE, '%Name')])])
Tree(start, [Tree(filters_function, [Token(FILTER, 'filter'), Tree(property_insensitive, []), Token(VALUE, 'Sex'), Token(VALUE, 'M'), Token(VALUE, 'F'), Token(VALUE, 'U')])])
Tree(start, [Tree(filters_function, [Token(FILTER, 'filter'), Tree(age_filter, []), Token(VALUE, 'on'), Token(VALUE, '${today}'), Token(VALUE, 'Equal'), Token(VALUE, '26'), Token(VALUE, 'years')])])
Tree(start, [Tree(filters_function, [Token(FILTER, 'filter'), Tree(date_filter, []), Token(VALUE, 'Equal'), Token(VALUE, '${today}')])])
Tree(start, [Tree(filters_function, [Token(FILTER, 'filter'), Tree(property, []), Token(VALUE, 'Registration_Type'), Token(VALUE, 'regular'), Token(VALUE, 'temp')])])
问题:
- 我找不到返回树值的方法
Tree(date_filter, [])
如果有人指出我在百灵鸟中为初学者提供了一个很好的教程,那就太好了。