我正在尝试检测文本中的有效 Java 注释。这是我的测试程序(为了简单起见,我目前忽略了所有空格,稍后我将添加它):
txts = ['@SomeName2', # match
'@SomeName2(', # no match
'@SomeName2)', # no match
'@SomeName2()', # match
'@SomeName2()()', # no match
'@SomeName2(value)', # no match
'@SomeName2(=)', # no match
'@SomeName2("")', # match
'@SomeName2(".")', # no match
'@SomeName2(",")', # match
'@SomeName2(value=)', # no match
'@SomeName2(value=")', # no match
'@SomeName2(=3)', # no match
'@SomeName2(="")', # no match
'@SomeName2(value=3)', # match
'@SomeName2(value=3L)', # match
'@SomeName2(value="")', # match
'@SomeName2(value=true)', # match
'@SomeName2(value=false)', # match
'@SomeName2(value=".")', # no match
'@SomeName2(value=",")', # match
'@SomeName2(x="o_nbr ASC, a")', # match
# multiple params:
'@SomeName2(,value="ord_nbr ASC, name")', # no match
'@SomeName2(value="ord_nbr ASC, name",)', # no match
'@SomeName2(value="ord_nbr ASC, name"insertable=false)', # no match
'@SomeName2(value="ord_nbr ASC, name",insertable=false)', # match
'@SomeName2(value="ord_nbr ASC, name",insertable=false,length=10L)', # match
'@SomeName2 ( "ord_nbr ASC, name", insertable = false, length = 10L )', # match
]
#regex = '((?:@[a-z][a-z0-9_]*))(\((((?:[a-z][a-z0-9_]*))(=)(\d+l?|"(?:[a-z0-9_, ]*)"|true|false))?\))?$'
#regex = '((?:@[a-z][a-z0-9_]*))(\((((?:[a-z][a-z0-9_]*))(=)(\d+l?|"(?:[a-z0-9_, ]*)"|true|false))?(,((?:[a-z][a-z0-9_]*))(=)(\d+l?|"(?:[a-z0-9_, ]*)"|true|false))*\))?$'
regex = r"""
(?:@[a-z]\w*) # @ + identifier (class name)
(
\( # opening parenthesis
(
(?:[a-z]\w*) # identifier (var name)
= # assigment operator
(\d+l?|"(?:[a-z0-9_, ]*)"|true|false) # either a numeric | a quoted string containing only alphanumeric chars, _, space | true | false
)? # optional assignment group
\) # closing parenthesis
)?$ # optional parentheses group (zero or one)
"""
rg = re.compile(regex, re.VERBOSE + re.IGNORECASE)
for txt in txts:
m = rg.search(txt)
#m = rg.match(txt)
if m:
print "MATCH: ",
output = ''
for i in xrange(2):
output = output + '[' + str(m.group(i+1)) + ']'
print output
else:
print "NO MATCH: " + txt
所以基本上我所拥有的似乎适用于零个或一个参数。现在我正在尝试将语法扩展到零个或多个参数,就像在上一个示例中一样。
然后,我复制了代表分配的正则表达式部分,并在第 2 到第 n 组(该组现在使用 * 而不是?)前面加上逗号:
regex = '((?:@[a-z][a-z0-9_]*))(\((((?:[a-z][a-z0-9_]*))(=)(\d+l?|"(?:[a-z0-9_, ]*)"|true|false))?(,((?:[a-z][a-z0-9_]*))(=)(\d+l?|"(?:[a-z0-9_, ]*)"|true|false))*\))?$'
然而那行不通。问题似乎是如何处理第一个元素,因为它必须是可选的,然后像第一个扩展示例这样的字符串'@SomeName2(,value="ord_nbr ASC, name")'
将被接受,这是错误的。我不知道如何使第 2 次到第 n 次分配仅取决于第一个(可选)元素的存在。
可以做到吗?是这样做的吗?你如何最好地解决这个问题?
谢谢