python - How to replace punctuation with HTML entities

Question

I would like to replace sections of punctuation in a string such as ,'".<>?;: with the corresponding HTML entities, ,'"<>;:. So far I've looked into using the string library with .maketrans and string.punctuation. It seems that you can convert ascii to string (but not the other way round. Based on what I've found thus far). Preferrably after a solution that I don't need to write RegEx (trying to not reinvent the wheel).

score 1 · Accepted Answer

正则表达式解决方案可能是最简单的，因为您只需调用re.sub().

import re
def htmlentities(s):
    return re.sub('[,\'".<>?;:]',
                  lambda m: return '#%d;' % m.group(0),
                  s)

score 1 · Accepted Answer

我认为 sasha 的代码有两个缺点：

每次char_htmlentities()调用中的字符时map(char_htmlentities,string)，它都会执行以下操作：测试if c in html_symbols、计算ord(c)、计算&#%d;' % ord(c)
每次htmlentities()在新字符串上调用时，char_htmlentities()都会再次创建该函数。

一种更好的方法是创建一个字典并将其设为的默认值htmlentities()，如以下代码：

import re

punct = ',\'".<>?;:'

def changing(m, d=dict((c,'&#%d;' % ord(c)) for c in punct)):
    return d[m.group()]
regx = re.compile('[%s]' % punct)

susu = 'hg! ab,sd, opo> godo; sza: popo.'
print susu
print regx.sub(changing,susu)

score 1 · Accepted Answer

您可以自己单独转换每个字符。

例如：

def htmlentities(string):
  def char_htmlentities(c):
    return '&#%d;' % ord(c) if c in html_symbols else c

  html_symbols =  set(',\'".<>?;:')
  return ''.join(map(char_htmlentities, string))

UPD：我将解决方案重写为时间复杂度的线性而不是二次

python - How to replace punctuation with HTML entities

3 回答 3

Related

Reference