尝试以下操作:
re.findall(r'^(\w+):(.*?)(?=^\w+:|\Z)', text, flags=re.DOTALL | re.MULTILINE)
例子:
>>> text = '''rootvg:
... hd5 boot 1 1 1 closed/syncd N/A
... hd4 jfs 38 38 1 open/syncd /
... datavg:
... data01lv jfs 7 7 1 open/syncd /data1
... data02lv jfs 7 7 1 open/syncd /data2'''
>>> re.findall(r'^(\w+):(.*?)(?=^\w+:|\Z)', text, flags=re.DOTALL | re.MULTILINE)
[('rootvg', '\nhd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n'), ('datavg', '\ndata01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2')]
re.DOTALL
标志使它.
可以匹配换行符,标志re.MULTILINE
使它可以分别匹配行的开头和结尾,而不仅仅是字符串的开头和结尾^
。$
解释:
^ # match at the start of a line
(\w+) # match one or more letters or numbers and capture in group 1
: # match a literal ':'
(.*?) # match zero or more characters, as few as possible
(?= # start lookahead (only match if following regex can match)
^\w+: # start of line followed by word characters then ':'
| # OR
\Z # end of the string
) # end lookahead
或者,您可以使用re.split()
更简单的正则表达式来获得类似的输出,将其转换为您需要的格式应该不会太难:
>>> re.split(r'^(\w+):', text, flags=re.MULTILINE)
['', 'rootvg', '\nhd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n', 'datavg', '\ndata01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2']
以下是您如何将其转换为所需格式的方法:
>>> matches = re.split(r'^(\w+):', text, flags=re.MULTILINE)
>>> [(v, matches[i+1]) for i, v in enumerate(matches) if i % 2]
[('rootvg', '\nhd5 boot 1 1 1 closed/syncd N/A\nhd4 jfs 38 38 1 open/syncd /\n'), ('datavg', '\ndata01lv jfs 7 7 1 open/syncd /data1\ndata02lv jfs 7 7 1 open/syncd /data2')]