4

我有一个包含特殊字符的文件,所以我使用文件操作来读取。

f=open('st.txt','r')
string=f.read()

示例字符串是

"Free Quote!\n          \n          Protecting your family is the best investment you\'ll eve=\nr \n" 

现在我想删除所有特殊字符并只从字符串中获取单词。这样我的字符串将是:

"Free Quote Protecting your family is the best investment you'll ever"
4

2 回答 2

4

可能最简单的方法是针对string.ascii_letters加上特定的额外字符子集(例如,'-)进行简单的循环测试:

>>> import string
>>> text = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"
>>> ''.join([x for x in text if x in string.ascii_letters + '\'- '])
"Free Quote  Protecting your family is the best investment you'll ever "

当您编辑更长和更复杂的文本时,排除特定的标点符号变得不那么可持续,并且您需要使用更复杂的正则表达式(例如,什么时候是'撇号或引号?),但对于上述问题的范围, 这应该足够了。

于 2013-04-16T06:26:56.283 回答
1

我找到了 3 个解决方案,但都接近但不完全是您想要的。

import re
in_string = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"

#variant 1
#Free Quote Protecting your family is the best investment youll eve r 
out_string = ""
array = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n".split( )
for word in array:
    out_string += re.sub(r'[\W]', '', word) + " "
print(out_string)

#variant 2
#Free Quote Protecting your family is the best investment you ll eve r
print(" ".join(re.findall("[a-zA-Z]+", in_string)))

#variant 3
#FreeQuoteProtectingyourfamilyisthebestinvestmentyoullever
print(re.sub(r'[\W]', '', in_string))
于 2013-04-15T07:33:14.117 回答