请您解释一下,我怎样才能制作匹配(arg1)
, (arg1, arg2)
,(arg1, arg2, xarg, zarg)
等的正则表达式。每个名称都是一个始终以 symbol 开头的 ASCII 字符串[A-Za-z]
。这是我尝试过的"("[A-Za-z][a-z0-9]*(,)?([A-Za-z][a-z0-9]*)?")"
:谢谢!
注意:正则表达式必须在flex
请您解释一下,我怎样才能制作匹配(arg1)
, (arg1, arg2)
,(arg1, arg2, xarg, zarg)
等的正则表达式。每个名称都是一个始终以 symbol 开头的 ASCII 字符串[A-Za-z]
。这是我尝试过的"("[A-Za-z][a-z0-9]*(,)?([A-Za-z][a-z0-9]*)?")"
:谢谢!
注意:正则表达式必须在flex
类似的东西?
>>> import re
>>> s = '''Could you explain, please, how can I make regex that will match (arg1), (arg1, arg2), (arg1, arg2, xarg, zarg), etc. Every name is an ASCII string which always starts with symbol [A-Za-z]. Here is what I've tried: "("[A-Za-z][a-z0-9]*(,)?([A-Za-z][a-z0-9]*)?")". Thanks!'''
>>> re.findall(r'\([A-Za-z]?arg[0-9]?(?:, [A-Za-z]?arg[0-9]?)*\)', s)
['(arg1)', '(arg1, arg2)', '(arg1, arg2, xarg, zarg)']
我不确定 flex 是否是正确的工具,因为您通常会使用它将这样的输入分隔成单独的标记。但是,这当然是可能的:
"("[[:alpha:]][[:alnum:]]*(,[[:alpha:]][[:alnum:]]*)*")"
那会匹配(arg1)
(arg1,arg2)
,但不会匹配( arg1 )
or (arg1, arg2)
。如果你想忽略所有地方的空格,它会变得有点冗长。
如果您使用 lex 定义,这类事情的可读性会更高:
ID [[:alpha:]][[:alnum:]]*
%%
"("{ID}(","{ID})*")"
或者,使用空间匹配:
/* Make sure you're in the C locale when you compile. Or adjust
* the definition accordingly. Perhaps you wanted to allow other
* characters in IDs.
*/
ID [[:alpha:]][[:alnum:]]*
/* OWS = Optional White Space.*/
/* Flex defines blank as "space or tab" */
OWS [[:blank:]]*
COMMA {OWS}","{OWS}
OPEN "("{OWS}
CLOSE {OWS}")"
%%
{OPEN}{ID}({COMMA}{ID})*{CLOSE} { /* Got a parenthesized list of ids */
最后说明:这也不匹配()
;必须至少有一个 id。如果您也想包含它,您可以将括号之间的部分设为可选:
{OPEN}({ID}({COMMA}{ID})*)?{CLOSE} { /* Got a parenthesized */
/* possibly empty list of ids */