示例字符串:
ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01] [KP:CCC LLL DDD]
ATT
首先在字符串中检测然后拆分三个合适的正则表达式是什么
ATT:A01AD05 B01AC06 N02BA01
并返回一个数组/列表。然后也只提取ABCDX PPP
部分
for the first part
import re
myString = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
pattern = r'ATT:.+\]'
match = re.search(pattern, myString)
matchList = str(match.group()).rstrip(']').split(' ')
print(matchList)
For the second part, everything is mostly the same except the expression to use is r'\w+\s\w+\s[' and you will need to change the rstrip to rstrip(' [')
Hope this helps
试试用这个也许?
import re
str = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
matched = re.search('([\S\s]+?)\s\[.*?(ATT:\S+)\s*(\S+)\s*(\S+)', str)
if matched:
tokens = matched.groups()
print tokens
编辑:根据新的约束:
import re
str = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01]'
matched = re.search('([\S\s]+?)\s\[.*?(ATT:[^\]]+)', str)
if matched:
first = matched.group(1)
result = matched.group(2).split(' ')
result[0:0] = [first]
print result
这是一个解决方案:
import re
sample = 'ABCDX PPP [ATT:A01AD05 B01AC06 N02BA01] [KP:CCC LLL DDD]'
pattern = '''
^(\S+\s\S+)\s # Matches "ABCDX PPP"
\[ATT: # Matches [ATT:
(\S+)\s+ # Matches A01AD05
(\S+)\s+ # Matches B01AC06
(\S+)\] # Matches N02BA01
'''
matched = re.search(pattern, sample, re.VERBOSE)
if matched:
tokens = matched.groups()
print tokens
输出:
('ABCDX PPP', 'A01AD05', 'B01AC06', 'N02BA01')
re
模块中的一个特性创建了一个自文档正则表达式,即re.VERBOSE
标志。此标志允许在表达式中使用任意空格和注释,从而增强可读性。\[
and\]
tokens
包含四个字符串的元组,请参见输出。