-2

示例字符串:

ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01] [KP:CCC LLL DDD]

ATT首先在字符串中检测然后拆分三个合适的正则表达式是什么

ATT:A01AD05 B01AC06 N02BA01 

并返回一个数组/列表。然后也只提取ABCDX PPP部分

4

3 回答 3

1

for the first part

import re

myString = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
pattern = r'ATT:.+\]'

match = re.search(pattern, myString)
matchList = str(match.group()).rstrip(']').split(' ')
print(matchList)

For the second part, everything is mostly the same except the expression to use is r'\w+\s\w+\s[' and you will need to change the rstrip to rstrip(' [')

Hope this helps

于 2013-08-09T19:54:08.957 回答
0

试试用这个也许?

import re

str = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
matched = re.search('([\S\s]+?)\s\[.*?(ATT:\S+)\s*(\S+)\s*(\S+)', str)
if matched:
    tokens = matched.groups()
    print tokens

编辑:根据新的约束:

import re

str = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
matched = re.search('([\S\s]+?)\s\[.*?(ATT:[^\]]+)', str)
if matched:
    first = matched.group(1)
    result = matched.group(2).split(' ')
    result[0:0] = [first]
    print result
于 2013-08-09T21:07:32.967 回答
0

这是一个解决方案:

import re

sample = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01] [KP:CCC LLL DDD]'
pattern = '''
        ^(\S+\s\S+)\s # Matches "ABCDX PPP"
        \[ATT:        # Matches [ATT:
        (\S+)\s+      # Matches A01AD05
        (\S+)\s+      # Matches B01AC06
        (\S+)\]       # Matches N02BA01
        '''
matched = re.search(pattern, sample, re.VERBOSE)
if matched:
    tokens = matched.groups()
    print tokens

输出:

('ABCDX PPP', 'A01AD05', 'B01AC06', 'N02BA01')

讨论

  • 我利用re模块中的一个特性创建了一个自文档正则表达式,即re.VERBOSE标志。此标志允许在表达式中使用任意空格和注释,从而增强可读性。
  • 左右方括号在正则表达式中具有特殊含义,这就是为什么我将它们转义为\[and\]
  • 在代码的末尾,tokens包含四个字符串的元组,请参见输出。
于 2013-08-09T20:14:38.240 回答