使用集合,您的代码的问题是所有['his','is','s']
实际上都是 的子字符串'this'
,因此条件始终为 False。(in
查找子字符串。)
>>> 'his' in 'this'
True
>>> 'is' in 'this'
True
>>> 's' in 'this'
True
解决方案1:
>>> seen = set() #keep a track of seen word here.
>>> words = ['this','his','is','s']
>>> output = []
>>> for word in words:
... if word not in seen:
... output.append(word)
... seen.add(word)
...
>>> print " ".join(output) #This is better than normal string concatenation
this his is s
使用列表推导的上述代码的较小版本:
>>> seen = set()
>>> " ".join([x for x in words if x not in seen and not seen.add(x)])
'this his is s'
解决方案2:
另一种方法(仅用于学习目的)是使用带有单词边界的正则表达式:
>>> import re
>>> ss = ''
for word in words:
#now this regex looks for exact word match, not just substring
if not re.search(r'\b{}\b'.format(re.escape(word)), ss):
ss += word + ' '
...
>>> ss
'this his is s '