4

我需要你的帮助来遵循正则表达式。我有一段文字

"[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."

使用我想得到的正则表达式

[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]

下面的正则表达式(\[[^\[$\]\]]*\])

给我 [Hello|Hi] [inviting | calling] [junior| mid junior]

那么我应该如何修复它以获得正确的输出?

4

3 回答 3

3

让我们定义你的字符串并导入 re:

>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> import re

现在,尝试:

>>> re.findall(r'\[ (?:[^][]* \[ [^][]* \])* [^][]*  \]', s, re.X)
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']

更详细

考虑这个脚本:

$ cat script.py
import re
s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."

matches = re.findall(r'''\[       # Opening bracket
        (?:[^][]* \[ [^][]* \])*  # Zero or more non-bracket characters followed by a [, followed by zero or more non-bracket characters, followed by a ]
        [^][]*                    # Zero or more non-bracket characters
        \]                        # Closing bracket
        ''',
        s,
        re.X)
print('\n'.join(matches))

这将产生输出:

$ python script.py
[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]
于 2016-10-28T06:47:11.433 回答
2

您可以使用简单stack的方法来代替recursive regex

x="[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer.[sd[sd[sd][sd]]]"
l=[]
st=[]
start=None
for i,j in enumerate(x):
    if j=='[':
        if j not in st:
            start = i
        st.append(j)
    elif j==']':
        st.pop()
        if not st:
            l.append(x[start:i+1])
print l

输出:['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]', '[sd[sd[sd][sd]]]']

于 2016-10-28T07:09:11.803 回答
1

您可以将以下代码与带有类似 PCRE 的正则表达式的PyPi 正则表达式模块r'\[(?:[^][]++|(?R))*]'一起使用:

>>> import regex
>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> r = regex.compile(r'\[(?:[^][]++|(?R))*]')
>>> print(r.findall(s))
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']
>>> 

请参阅正则表达式演示

\[(?:[^][]++|(?R))*]匹配 a ,然后匹配[零个或多个 1+ 字符序列,而不是][OR 整个括号表达式[...],然后是结束]

于 2016-10-28T06:57:04.957 回答