以下代码示例实现了您想要的(与@PaulMcGuire 建议的先前版本相比有所改进):
from __future__ import print_function
from pyparsing import CharsNotIn, Group, LineEnd, OneOrMore, Word, ZeroOrMore
from pyparsing import delimitedList, nums
SPACE_CHARS = ' \t'
word = CharsNotIn(SPACE_CHARS)
space = Word(SPACE_CHARS, exact=1)
label = delimitedList(word, delim=space, combine=True)
# an alternative contruction for 'label' could be:
# label = Combine(word + ZeroOrMore(space + word))
value = Word(nums)
line = label('label') + Group(OneOrMore(value))('values') + LineEnd().suppress()
text = """
string 0 1 10
string with white space 0 10 30
string9 with number 9 10 20 50
string_ with underline 10 50 1
(string with parentese) 50 20 100
""".strip()
print('input text:\n', text, '\nparsed text:\n', sep='\n')
for line_tokens, start_location, end_location in line.scanString(text):
print(line_tokens.dump())
给出以下输出:
input text:
string 0 1 10
string with white space 0 10 30
string9 with number 9 10 20 50
string_ with underline 10 50 1
(string with parentese) 50 20 100
parsed text:
['string', ['0', '1', '10']]
- label: string
- values: ['0', '1', '10']
['string with white space', ['0', '10', '30']]
- label: string with white space
- values: ['0', '10', '30']
['string9 with number 9', ['10', '20', '50']]
- label: string9 with number 9
- values: ['10', '20', '50']
['string_ with underline', ['10', '50', '1']]
- label: string_ with underline
- values: ['10', '50', '1']
['(string with parentese)', ['50', '20', '100']]
- label: (string with parentese)
- values: ['50', '20', '100']
解析后的值可以作为字典获得,其中第一列(label
在上面的示例中命名)作为键,其余列的列表(values
上面命名)作为值,具有以下dict
理解:
{label: values.asList() for label, values in line.searchString(text)}
其中line
和text
是上面示例中的变量,生成以下结果:
{'(string with parentese)': ['50', '20', '100'],
'string': ['0', '1', '10'],
'string with white space': ['0', '10', '30'],
'string9 with number 9': ['10', '20', '50'],
'string_ with underline': ['10', '50', '1']}