python - 在新行、制表符和一些空格上拆分字符串

Question

我正在尝试对一组看起来像这样的不规则数据执行字符串拆分：

\n\tName: John Smith
\n\t  Home: Anytown USA
\n\t    Phone: 555-555-555
\n\t  Other Home: Somewhere Else
\n\t Notes: Other data
\n\tName: Jane Smith
\n\t  Misc: Data with spaces

我想将其转换为元组/字典，稍后我将在冒号上拆分:，但首先我需要摆脱所有额外的空格。我猜正则表达式是最好的方法，但我似乎无法找到一个有效的方法，下面是我的尝试。

data_string.split('\n\t *')

score 84 · Accepted Answer

只需使用.strip()，它会在拆分时为您删除所有空格，包括制表符和换行符。然后可以使用以下方法完成拆分本身data_string.splitlines()：

[s.strip() for s in data_string.splitlines()]

输出：

>>> [s.strip() for s in data_string.splitlines()]
['Name: John Smith', 'Home: Anytown USA', 'Phone: 555-555-555', 'Other Home: Somewhere Else', 'Notes: Other data', 'Name: Jane Smith', 'Misc: Data with spaces']

您现在甚至可以内联拆分:：

>>> [s.strip().split(': ') for s in data_string.splitlines()]
[['Name', 'John Smith'], ['Home', 'Anytown USA'], ['Phone', '555-555-555'], ['Other Home', 'Somewhere Else'], ['Notes', 'Other data'], ['Name', 'Jane Smith'], ['Misc', 'Data with spaces']]

score 7 · Accepted Answer

>>> for line in s.splitlines():
...     line = line.strip()
...     if not line:continue
...     ary.append(line.split(":"))
...
>>> ary
[['Name', ' John Smith'], ['Home', ' Anytown USA'], ['Misc', ' Data with spaces'
]]
>>> dict(ary)
{'Home': ' Anytown USA', 'Misc': ' Data with spaces', 'Name': ' John Smith'}
>>>

score 5 · Accepted Answer

如果您查看以下文档str.split：

如果 sep 未指定或为 None，则应用不同的拆分算法：连续空格的运行被视为单个分隔符，如果字符串具有前导或尾随空格，则结果将在开头或结尾不包含空字符串。因此，使用 None 分隔符拆分空字符串或仅包含空格的字符串将返回 []。

换句话说，如果您想弄清楚要传递什么来到达split，那么什么都不传递（或不传递）。'\n\tName: Jane Smith'['Name:', 'Jane', 'Smith']

这几乎解决了你的整个问题。剩下两部分。

首先，您只有两个字段，其中第二个可以包含空格。所以，你只想要一个分裂，而不是尽可能多的分裂。所以：

s.split(None, 1)

接下来，你还有那些讨厌的冒号。但是你不需要分裂它们。至少给定您向我们展示的数据，冒号始终出现在第一个字段的末尾，之前没有空格，之后总是有空格，因此您可以将其删除：

key, value = s.split(None, 1)
key = key[:-1]

当然，还有一百万种其他方法可以做到这一点。这只是似乎最接近您已经尝试过的那个。

score 5 · Accepted Answer

您可以用一块正则表达式石杀死两只鸟：

>>> r = """
... \n\tName: John Smith
... \n\t  Home: Anytown USA
... \n\t    Phone: 555-555-555
... \n\t  Other Home: Somewhere Else
... \n\t Notes: Other data
... \n\tName: Jane Smith
... \n\t  Misc: Data with spaces
... """
>>> import re
>>> print re.findall(r'(\S[^:]+):\s*(.*\S)', r)
[('Name', 'John Smith'), ('Home', 'Anytown USA'), ('Phone', '555-555-555'), ('Other Home', 'Somewhere Else'), ('Notes', 'Other data'), ('Name', 'Jane Smith'), ('Misc', 'Data with spaces')]
>>>

score 0 · Accepted Answer

正则表达式并不是这里工作的最佳工具。正如其他人所说，使用str.strip()and的组合str.split()是要走的路。这是一个单一的班轮做到这一点：

>>> data = '''\n\tName: John Smith
... \n\t  Home: Anytown USA
... \n\t    Phone: 555-555-555
... \n\t  Other Home: Somewhere Else
... \n\t Notes: Other data
... \n\tName: Jane Smith
... \n\t  Misc: Data with spaces'''
>>> {line.strip().split(': ')[0]:line.split(': ')[1] for line in data.splitlines() if line.strip() != ''}
{'Name': 'Jane Smith', 'Other Home': 'Somewhere Else', 'Notes': 'Other data', 'Misc': 'Data with spaces', 'Phone': '555-555-555', 'Home': 'Anytown USA'}

score 0 · Accepted Answer

0

你可以用这个

string.strip().split(":")

于 2012-09-21T15:59:37.067 回答

score 0 · Accepted Answer

我不得不在换行符 (\n) 和制表符 (\t) 上拆分字符串。我所做的是首先用 \t 替换 \n ，然后在 \t 上拆分

example_arr = example_string.replace("\n", "\t").split("\t")

python - 在新行、制表符和一些空格上拆分字符串

7 回答 7

Related

Reference