我正在编写一个字符串解析器来做类似于Patsy的事情
我已经让运算符正常工作(:、+、-、/ 等),但我似乎无法让函数正常工作。我只是复制粘贴直接相关的功能
from ply import yacc, lex
em_data = {'a': ['a1', 'a1', 'a2', 'a2', 'a1', 'a1', 'a2', 'a2'],
'b': ['b1', 'b2', 'b1', 'b2', 'b1', 'b2', 'b1', 'b2'],
'x1': [1.76405235, 0.40015721, 0.97873798, 2.2408932, 1.86755799,
-0.97727788, 0.95008842, -0.15135721],
'x2': [-0.10321885, 0.4105985, 0.14404357, 1.45427351, 0.76103773,
0.12167502, 0.44386323, 0.33367433],
'y': [1.49407907, -0.20515826, 0.3130677, -0.85409574, -2.55298982,
0.6536186, 0.8644362, -0.74216502],
'z': [2.26975462, -1.45436567, 0.04575852, -0.18718385, 1.53277921,
1.46935877, 0.15494743, 0.37816252]}
########################################
# define all the tokens we will need
########################################
tokens = (
# Atomics
"NAME", # Feature names
"NUMBER", # Numeric numbers
# Binary Ops
"RELATIONSHIP", # y ~ x : end result is a tuple of (y, x)
# Symbols
"LPAREN",
"RPAREN",
# Functions
"C" # Expands vector elements into 1-hot
)
########################################
# Building the regexps
########################################
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
t_ignore = ' '
# Atomics
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
def t_NUMBER(t):
r'\d+'
t.value = str(t.value)
return t
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_C = r'C'
########################################
# Define the parser
########################################
lex.lex(debug=True)
precedence = (
("left", "RELATIONSHIP"),
("left", "C")
)
然后我的解析器
precedence = (
("left", "RELATIONSHIP"),
("left", "C")
)
def p_expression_group(p):
"expression : LPAREN expression RPAREN"
p[0] = p[2]
def p_expression_number(p):
"expression : NUMBER"
p[0] = p[1]
def p_expression_name(p):
"expression : NAME"
if p[1] not in em_data:
raise RuntimeError(f"Term: {p[1]} not found in dataset lookup")
p[0] = p[1]
def p_error(p):
print(f"Syntax error at {p.value!r}")
def p_C(p):
"statement : C LPAREN expression RPAREN"
print("got in here!")
if __name__ == '__main__':
s = "C(a)"
yacc.yacc()
yacc.parse(s)
一些问题:
如果我放入
statement
或expression
放入 p_X 正则表达式有关系吗?从我读过的内容来看,表达式是可以简化为单个值的东西,IMO 意味着不能将语句简化为单个值。例如x = 5
不能减少?在这种情况下,列表例如["x", "x:z", ...]
是语句还是表达式?我的直觉说这是一种表达,但我想确定运行上述所有代码时,我遇到
Syntax error at '('
. 我不确定为什么会这样。我看不出它应该有什么理由