python - Python 正则表达式并返回一个数组

Question

示例字符串：

ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01] [KP:CCC LLL DDD]

ATT首先在字符串中检测然后拆分三个合适的正则表达式是什么

ATT:A01AD05 B01AC06 N02BA01

并返回一个数组/列表。然后也只提取ABCDX PPP部分

score 1 · Accepted Answer

for the first part

import re

myString = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
pattern = r'ATT:.+\]'

match = re.search(pattern, myString)
matchList = str(match.group()).rstrip(']').split(' ')
print(matchList)

For the second part, everything is mostly the same except the expression to use is r'\w+\s\w+\s[' and you will need to change the rstrip to rstrip(' [')

Hope this helps

score 0 · Accepted Answer

试试用这个也许？

import re

str = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
matched = re.search('([\S\s]+?)\s\[.*?(ATT:\S+)\s*(\S+)\s*(\S+)', str)
if matched:
    tokens = matched.groups()
    print tokens

编辑：根据新的约束：

import re

str = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
matched = re.search('([\S\s]+?)\s\[.*?(ATT:[^\]]+)', str)
if matched:
    first = matched.group(1)
    result = matched.group(2).split(' ')
    result[0:0] = [first]
    print result

score 0 · Accepted Answer

这是一个解决方案：

import re

sample = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01] [KP:CCC LLL DDD]'
pattern = '''
        ^(\S+\s\S+)\s # Matches "ABCDX PPP"
        \[ATT:        # Matches [ATT:
        (\S+)\s+      # Matches A01AD05
        (\S+)\s+      # Matches B01AC06
        (\S+)\]       # Matches N02BA01
        '''
matched = re.search(pattern, sample, re.VERBOSE)
if matched:
    tokens = matched.groups()
    print tokens

输出：

('ABCDX PPP', 'A01AD05', 'B01AC06', 'N02BA01')

讨论

我利用re模块中的一个特性创建了一个自文档正则表达式，即re.VERBOSE标志。此标志允许在表达式中使用任意空格和注释，从而增强可读性。
左右方括号在正则表达式中具有特殊含义，这就是为什么我将它们转义为\[and\]
在代码的末尾，tokens包含四个字符串的元组，请参见输出。

python - Python 正则表达式并返回一个数组

3 回答 3

讨论

Related

Reference