2

我有一段文字:

text = '''
    Wales greatest moment. Lille is so close to the Belgian 
    border, 
    this was essentially a home game for one of the tournament favourites. Their 
    confident supporters mingled with their new Welsh fans on the streets, 
    buying into the carnival spirit - perhaps more relaxed than some might have 
    been before a quarter-final because they thought this was their time.
    In the driving rain, Wales produced the best performance in their history to 
    carry the nation into uncharted territory. Nobody could quite believe it.'''

我有一个代码:

 words = text.replace('.',' ').replace(',',' ').replace('\n',' ').split(' ')
    print(words)

和输出:

['Wales', 'greatest', 'moment', '', 'Lille', 'is', 'so', 'close', 'to', 'the', 'Belgian', 'border', '', '', 'this', 'was', 'essentially', 'a', 'home', 'game', 'for', 'one', 'of', 'the', 'tournament', 'favourites', '', 'Their', '', 'confident', 'supporters', 'mingled', 'with', 'their', 'new', 'Welsh', 'fans', 'on', 'the', 'streets', '', '', 'buying', 'into', 'the', 'carnival', 'spirit', '-', 'perhaps', 'more', 'relaxed', 'than', 'some', 'might', 'have', '', 'been', 'before', 'a', 'quarter-final', 'because', 'they', 'thought', 'this', 'was', 'their', 'time', '', 'In', 'the', 'driving', 'rain', '', 'Wales', 'produced', 'the', 'best', 'performance', 'in', 'their', 'history', 'to', '', 'carry', 'the', 'nation', 'into', 'uncharted', 'territory', '', 'Nobody', 'could', 'quite', 'believe', 'it', '']

可以看到,list 有空格,我删除了'\n'',''.'

但现在我不知道如何删除这些空格。

4

3 回答 3

3

如果你不喜欢它们,你可以过滤它们

no_empties = list(filter(None, words))

如果 function 是None,则假定恒等函数,即所有 iterable 为 false 的元素都被删除。

这是有效的,因为空元素被认为是错误的。

于 2021-06-30T17:22:27.887 回答
2

编辑:

由于破折号,原始答案不会产生与评论中提到的相同的输出,以避免出现这种情况:

import re
words = re.findall(r'[\w-]+', text)

原始答案

re你可以直接用模块得到你想要的

import re
words = re.findall(r'\w+', text)


['Wales',
 'greatest',
 'moment',
 'Lille',
 'is',
 'so',
 'close',
 'to',
 'the',
 'Belgian',
 'border',
 'this',
 'was',
 'essentially',
 'a',
 'home',
 'game',
 'for',
 'one',
 'of',
 'the',
 'tournament',
 'favourites',
 'Their',
 'confident',
 'supporters',
 'mingled',
 'with',
 'their',
 'new',
 'Welsh',
 'fans',
 'on',
 'the',
 'streets',
 'buying',
 'into',
 'the',
 'carnival',
 'spirit',
 'perhaps',
 'more',
 'relaxed',
 'than',
 'some',
 'might',
 'have',
 'been',
 'before',
 'a',
 'quarter',
 'final',
 'because',
 'they',
 'thought',
 'this',
 'was',
 'their',
 'time',
 'In',
 'the',
 'driving',
 'rain',
 'Wales',
 'produced',
 'the',
 'best',
 'performance',
 'in',
 'their',
 'history',
 'to',
 'carry',
 'the',
 'nation',
 'into',
 'uncharted',
 'territory',
 'Nobody',
 'could',
 'quite',
 'believe',
 'it']
于 2021-06-30T17:29:27.890 回答
1

您遇到此问题的原因是您的文本值在每一行中缩进了 4 个空格,而不是因为您的代码有缺陷。如果您的意思是每行有 4 个单个空格,您可以添加.replace(' ','')到您的“单词”逻辑来解决这个问题,或者您可以参考 Thomas Weller 的解决方案,无论您留下多少个连续的单个空格,它都会解决问题

于 2021-06-30T17:29:06.097 回答