python - 如何忽略 [az][AZ] 以外的字符

Question

如何在 python 的输入字符串中忽略 [az][AZ] 以外的字符，以及在应用方法后字符串会是什么样子？

我需要使用正则表达式吗？

score 3 · Accepted Answer

如果您需要使用正则表达式，请使用否定字符类 ( [^...])：

re.sub(r'[^a-zA-Z]', '', inputtext)

否定字符类匹配该类中未命名的任何内容。

演示：

>>> import re
>>> inputtext = 'The quick brown fox!'
>>> re.sub(r'[^a-zA-Z]', '', inputtext)
'Thequickbrownfox'

但使用str.translate()速度更快：

import string
ascii_letters = set(map(ord, string.ascii_letters))
non_letters = ''.join(chr(i) for i in range(256) if i not in ascii_letters)
inputtext.translate(None, non_letters)

使用str.translate()比正则表达式快 10 倍以上：

>>> import timeit, partial, re
>>> ascii_only = partial(re.compile(r'[^a-zA-Z]').sub, '')
>>> timeit.timeit('f(t)', 'from __main__ import ascii_only as f, inputtext as t')
7.903045892715454
>>> timeit.timeit('t.translate(None, m)', 'from __main__ import inputtext as t, non_letters as m')
0.5990171432495117

使用 Jakub 的方法仍然比较慢：

>>> timeit.timeit("''.join(c for c in t if c not in l)", 'from __main__ import inputtext as t; import string; l = set(string.letters)')
9.960685968399048

score 0 · Accepted Answer

您可以使用正则表达式：

re.compile(r'[^a-zA-Z]').sub('', your_string)

您也可以在没有正则表达式的情况下进行管理（例如，如果您有正则表达式恐惧症）：

import string
new_string = ''.join(c for c in old_string
                     if c not in set(string.letters))

虽然我会使用正则表达式，但这个示例具有额外的教育价值：set、理解和字符串库。注意set这里不是严格需要的

python - 如何忽略 [az][AZ] 以外的字符

2 回答 2

Related

Reference