python - Python：拆分时如何获得正确的列表？

Question

在 test.txt 中，我有 2 行句子。

The heart was made to be broken.
There is no surprise more magical than the surprise of being loved.

编码：

import re
file = open('/test.txt','r')#specify file to open
data = file.readlines()
file.close()
for line in data:
    line_split = re.split(r'[ \t\n\r, ]+',line)
    print line_split

代码的结果：

['The', 'heart', 'was', 'made', 'to', 'be', 'broken.', '']
['There', 'is', 'no', 'surprise', 'more', 'magical', 'than', 'the', 'surprise', 'of', 'being', 'loved.']

如何只打印单词？（见第一句）期望结果：

['The', 'heart', 'was', 'made', 'to', 'be', 'broken.']
['There', 'is', 'no', 'surprise', 'more', 'magical', 'than', 'the', 'surprise', 'of', 'being', 'loved.']

有什么建议吗？

score 3 · Accepted Answer

split您可以使用findall否定的正则表达式来匹配要保留的部分，而不是使用来匹配分隔符：

line_split = re.findall(r'[^ \t\n\r., ]+',line)

在线查看它：ideone

score 1 · Accepted Answer

要修复，并进行一些其他更改，请进一步解释：

import re

with open("test.txt", "r") as file:
    for line in file:
        line_split = filter(bool, re.split(r'[ \t\n\r, ]+', line))
        print(line_split)

在这里，我们使用 afilter()从结果中删除任何空字符串。

请注意我使用该with语句打开文件。这更具可读性，并为您处理关闭文件，即使在异常情况下也是如此。

我们还直接遍历文件 - 这是一个更好的主意，因为它不会一次将整个文件加载到内存中，这不是必需的，并且可能会导致大文件出现问题。

score 1 · Accepted Answer

words = re.compile(r"[\w']+").findall(yourString)

演示

>>> yourString = "Mary's lamb was white as snow."
["Mary's", 'lamb', 'was', 'white', 'as', 'snow']

如果你真的想要月经，你可以将它们添加为[\w'\.]

score 0 · Accepted Answer

In [2]: with open('test.txt','r') as f:
   ...:     lines = f.readlines()
   ...:

In [3]: words = [l.split() for l in lines]

In [4]: words
Out[4]:
[['The', 'heart', 'was', 'made', 'to', 'be', 'broken.'],
 ['There',
  'is',
  'no',
  'surprise',
  'more',
  'magical',
  'than',
  'the',
  'surprise',
  'of',
  'being',
  'loved.']]

python - Python：拆分时如何获得正确的列表？

4 回答 4

Related

Reference