0

我的任务的目的是在标点符号前后添加空格。目前我一直在使用迭代str.replace()将每个标点符号替换p" "+p+" ". 我如何str.translate()通过传入两个列表或字典来实现相同的输出:

inlist = string.punctuation
outlist = [" "+p+" " for p in string.punctuation]
inoutdict = {p:" "+p+" " for p in string.punctuation}

让我们假设我所有的标点符号都在string.punctuation. 目前,我正在这样做:

from string import punctuation as punct
def punct_tokenize(text):
  for ch in text:
    if ch in deupunct:
      text = text.replace(ch, " "+ch+" ")
  return " ".join(text.split())

sent = "This's a foo-bar sentences with many, many punctuation."
print punct_tokenize(sent)

这个迭代str.replace()也花费了太长时间,会str.translate()更快吗?

4

1 回答 1

1

translate 的 dict 形式仅适用于 unicode:

>>> import string
>>> inoutdict = {ord(p):unicode(" "+p+" ") for p in string.punctuation}
>>> unicode("foo,,,bar!!1").translate(inoutdict)
u'foo ,  ,  , bar !  ! 1'

另一种选择是使用正则表达式:

>>> import re
>>> rx = '[%s]' % re.escape(string.punctuation)
>>> re.sub(rx, r" \g<0> ", "foo,,,bar!!1")
'foo ,  ,  , bar !  ! 1'

像往常一样,向我们展示一个更大的图景以获得更好的答案,例如你为什么要这样做?输入来自哪里?等...

于 2013-10-17T09:38:52.747 回答