python - 如何用 str.translate() 转换迭代 str.replace()？- Python

Question

我的任务的目的是在标点符号前后添加空格。目前我一直在使用迭代str.replace()将每个标点符号替换p为" "+p+" ". 我如何str.translate()通过传入两个列表或字典来实现相同的输出：

inlist = string.punctuation
outlist = [" "+p+" " for p in string.punctuation]
inoutdict = {p:" "+p+" " for p in string.punctuation}

让我们假设我所有的标点符号都在string.punctuation. 目前，我正在这样做：

from string import punctuation as punct
def punct_tokenize(text):
  for ch in text:
    if ch in deupunct:
      text = text.replace(ch, " "+ch+" ")
  return " ".join(text.split())

sent = "This's a foo-bar sentences with many, many punctuation."
print punct_tokenize(sent)

这个迭代str.replace()也花费了太长时间，会str.translate()更快吗？

score 1 · Accepted Answer

translate 的 dict 形式仅适用于 unicode：

>>> import string
>>> inoutdict = {ord(p):unicode(" "+p+" ") for p in string.punctuation}
>>> unicode("foo,,,bar!!1").translate(inoutdict)
u'foo ,  ,  , bar !  ! 1'

另一种选择是使用正则表达式：

>>> import re
>>> rx = '[%s]' % re.escape(string.punctuation)
>>> re.sub(rx, r" \g<0> ", "foo,,,bar!!1")
'foo ,  ,  , bar !  ! 1'

像往常一样，向我们展示一个更大的图景以获得更好的答案，例如你为什么要这样做？输入来自哪里？等...

python - 如何用 str.translate() 转换迭代 str.replace()？- Python

1 回答 1

Related

Reference