python - python中的正则表达式嵌套括号

Question

我有这样的事情：

Othername California (2000) (T) (S) (ok) {state (#2.1)}

是否有正则表达式代码来获取：

Othername California ok 2.1

即我想将数字保留在圆括号内，而圆括号又在 {} 内，并将文本“ok”保留在 () 内。如果包含在我的行中，我特别需要打印字符串“ok”，但我想去掉括号内的其他文本，例如 (V)、(S) 或 (2002)。

我知道正则表达式可能不是处理此类问题的最有效方法。

任何帮助，将不胜感激。

编辑：

该字符串可能会有所不同，因为如果某些信息不可用，则不包含在该行中。文本本身也是可变的（例如，我没有每一行的“状态”）。所以可以有例如：

Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}

score 8 · Accepted Answer

正则表达式

(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}

正则表达式图片

用于测试的文本

Name1 Name2 Name3 (2000) {教育 (#3.2)}
Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
姓名 1 姓名 2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {变体 (#4.12)}
别名加利福尼亚 (2000) (T) (S) (ok) {state (#2.1)}

测试

>>> 正则表达式 = re.compile("(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={)) ?\{.+\(#(\d+\.\d+)\)\}")
>>> r = 正则表达式。搜索（字符串）
>>> r
<_sre.SRE_Match 对象位于 0x54e2105f36c16a48>
>>> 正则表达式匹配（字符串）
<_sre.SRE_Match 对象位于 0x54e2105f36c169e8>

# 运行 findall
>>> 正则表达式.findall（字符串）
[
   (u'Name1 Name2 Name3' , u'' , u'3.2'),
   (u'Name1 Name2 Name3' , u'ok', u'1.1'),
   (u'Name1 Name2' , u'' , u'1.1'),
   (u'Name1 Name2 Name3' , u'' , u'4.12'),
   (u'Othername California', u'ok', u'2.1')
]

score 2 · Accepted Answer

试试这个：

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'

regex = r'''
    ([^(]*)             # match anything but a (
    \                   # a space
    (?:                 # non capturing parentheses
        \([^(]*\)       # parentheses
        \               # a space
    ){3}                # three times
    \(([^(]*)\)         # capture fourth parentheses contents
    \                   # a space
    {                   # opening {
        [^}]*           # anything but }
        \(\#            # opening ( followed by #
            ([^)]*)     # match anything but )
        \)              # closing )
    }                   # closing }
'''

match = re.match(regex, thestr, re.X)

print match.groups()

输出：

('Othername California', 'ok', '2.1')

这是压缩版本：

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'([^(]*) (?:\([^(]*\) ){3}\(([^(]*)\) {[^}]*\(\#([^)]*)\)}'
match = re.match(regex, thestr)

print match.groups()

score 1 · Accepted Answer

尽管我在评论中说过。我找到了解决方法：

(?(?=\([^()\w]*[\w.]+[^()\w]*\))\([^()\w]*([\w.]+)[^()\w]*\)|.)(?=[^{]*\})|(?<!\()(\b\w+\b)(?!\()|ok

解释：

(?                                  # If
(?=\([^()\w]*[\w.]+[^()\w]*\))      # There is (anything except [()\w] zero or more times, followed by [\w.] one or more times, followed by anything except [()\w] zero or more times)
\([^()\w]*([\w.]+)[^()\w]*\)        # Then match it, and put [\w.] in a group
|                                   # else
.                                   # advance with one character
)                                   # End if
(?=[^{]*\})                         # Look ahead if there is anything except { zero or more times followed by }

|                                   # Or
(?<!\()(\b\w+\b)(?!\()              # Match a word not enclosed between parenthesis
|                                   # Or
ok                                  # Match ok

在线演示

score 0 · Accepted Answer

0

另一种情况是：

^(\w+\s?\w+)\s?\(\d{1,}\)\s?\(\w+\)\s?\(\w+\)\s?\((\w+)\)\s?.*#(\d.\d)

于 2013-06-18T09:27:03.720 回答

python - python中的正则表达式嵌套括号

4 回答 4

正则表达式

用于测试的文本

测试

Related

Reference