1

我正在尝试编译一个正则表达式,以便能够r'#\w+'从推文中累积一系列主题标签 ()。我希望能够编译两个正则表达式,它们可以从推文的开始和结束来做到这一点。我正在使用 python 272,我的代码是这样的。

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                                       #Outermost grouping to match overall regex
#\w+                                    #The hashtag matching. It's a valid combination of \w+
([:\s,]*#\w+)*                          #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                                       #Closing parenthesis of outermost grouping to match overall regex
"""

LEFT_HASHTAG_REGEX_SEQ      = re.compile('^' + HASHTAG_SEQ_REGEX_PATTERN , re.VERBOSE | re.IGNORECASE)

当我正在编译正则表达式的行被执行时,我收到以下错误:

sre_constants.error: unbalanced parenthesis

我不知道为什么会这样,因为在我的正则表达式模式中没有看到不平衡的括号。

4

4 回答 4

5

此行在 FIRST 之后被注释掉#

        v----comment starts here
([:\s,]*#\w+)*  ...

逃脱它:

([:\s,]*\#\w+)*  

这行也是,但它不会导致括号不平衡:)

v----escape me
#\w+                                    #The hashtag matching ... 

 

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                 # Outermost grouping to match overall regex
\#\w+             # The hashtag matching. It's a valid combination of \w+
([:\s,]*\#\w+)*   # This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                 # Closing parenthesis of outermost grouping to match overall regex
"""
于 2013-03-07T22:17:26.310 回答
3

你有一些未转义的哈希值,你想合法使用,但把VERBOSE你搞砸了:

\#\w+
([:\s,]*\#\w+)*   #reported issue caused by this hash
于 2013-03-07T22:19:26.520 回答
2

如果您将模式编写如下,则不会遇到此问题:

HASHTAG_SEQ_REGEX_PATTERN = (
'('    #Outermost grouping to match overall regex
'#\w+'     #The hashtag matching. It's a valid combination of \w+
'([:\s,]*#\w+)*'    #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
')'    #Closing parenthesis of outermost grouping to match overall regex
)

就个人而言,我从不使用 re.VERBOSE,我从不提醒有关空格和其他规则的规则

于 2013-03-07T22:29:54.667 回答
0

或者,用于[#]向正则表达式添加一个#不打算开始评论的符号:

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                   #Outermost grouping to match overall regex
[#]\w+                #The hashtag matching. It's a valid combination of \w+
([:\s,]*[#]\w+)*      #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                   #Closing parenthesis of outermost grouping to match overall regex
"""

我觉得这更具可读性。

于 2013-03-07T22:25:17.267 回答