python - python - 如何在python中使用字符串保持标点符号？

Question

我想创建所有日志的目录，所以我只想保留所有标点符号并删除所有其他字符，包括 CJK 和其他字符。

例如：

s = "aaa; sf = fa = bla http://wa"

预期输出是

;==://

score 16 · Accepted Answer

您可以使用str.translate：

>>> from string import letters, digits, whitespace, punctuation
>>> s = "aaa; sf = fa = bla http://wa"
>>> s.translate(None, letters+digits+whitespace)
';==://'

或regex：

>>> re.sub(r'[^{}]+'.format(punctuation),'',s)
';==://'

时间比较：

>>> s = "aaa; sf = fa = bla http://wa"*1000
>>> %timeit s.translate(None,letters+digits+whitespace)
10000 loops, best of 3: 171 us per loop                  #winner
>>> r1 = re.compile(r'[^{}]+'.format(punctuation))
>>> r2 = re.compile(r'[\w\s]+')
>>> %timeit r1.sub('',s)
100 loops, best of 3: 2.64 ms per loop
>>> %timeit r2.sub('',s)
100 loops, best of 3: 3.31 ms per loop

score 3 · Accepted Answer

使用正则表达式：

>>> re.sub(r'[\w\s]+', '', "aaa; sf = fa = bla http://wa")
';==://'

编译可以购买一些性能，即使对于这样一个简单的模式......

>>> %timeit re.sub(r'[\w\s]+', '', "aaa; sf = fa = bla http://wa")
100000 loops, best of 3: 6.78 us per loop

>>> e = re.compile(r'[\w\s]+')
>>> %timeit e.sub('', "aaa; sf = fa = bla http://wa")
100000 loops, best of 3: 4.91 us per loop

...但是正则表达式与使用 str.translate 的 Ashwinis 的解决方案不匹配：

>>> %timeit "aaa; sf = fa = bla http://wa".translate(None,letters+digits+whitespace)
1000000 loops, best of 3: 1.31 us per loop

python - python - 如何在python中使用字符串保持标点符号？

2 回答 2

Related

Reference