用一个小 PLY 程序探索了这个问题后,我认为您的问题与数据处理中处理原始字符串和非原始字符串之间的差异有关,而不是与 PLY 解析和词法匹配本身有关。(顺便说一句, python V2 和 python v3在这个字符串处理领域存在细微差别。我已将代码限制为 python v2)。
如果您使用非原始字符串或使用input
而不是raw_input
. 这从我的示例代码和下面的结果中显示:
命令:
$ python --version
Python 2.7.5
$ python string.py
import sys
if ".." not in sys.path: sys.path.insert(0,"..")
import ply.lex as lex
tokens = (
'NORMSTRING',
'VAR'
)
def t_NORMSTRING(t):
r'"([^"\n]|(\\"))*"$'
print "String: '%s'" % t.value
def t_VAR(t):
r'[a-zA-Z_][a-zA-Z_0-9]*'
t_ignore = ' \t\r\n'
def t_error(t):
print "Illegal character '%s'" % t.value[0]
t.lexer.skip(1)
lexer = lex.lex()
data = r'"I do not know what \"A\" is"'
print "Data: '%s'" % data
lexer.input(data)
while True:
tok = lexer.token()
if not tok: break
print tok
输出:
Data: '"I do not know what \"A\" is"'
String: '"I do not know what \"A\" is"'
data = '"I do not know what \"A\" is"'
print "Data: '%s'" % data
lexer.input(data)
while True:
tok = lexer.token()
if not tok: break
print tok
输出:
Data: '"I do not know what "A" is"'
Illegal character '"'
Illegal character '"'
String: '" is"'
lexer.input(raw_input("Please type your line: "));
while True:
tok = lexer.token()
if not tok: break
print tok
输出:
Please type your line: "I do not know what \"A\" is"
String: '"I do not know what \"A\" is"'
lexer.input(input("Please type your line: "));
while True:
tok = lexer.token()
if not tok: break
print tok
输出:
Please type your line: "I do not know what \"A\" is"
Illegal character '"'
Illegal character '"'
最后一点,您可能不需要$
正则表达式中的字符串锚点。