python - 用字典替换字符串，用标点符号替换

Question

我正在尝试编写一个函数 process(s,d) 来使用字典将字符串中的缩写替换为它们的完整含义。其中 s 是字符串输入，d 是字典。例如：

>>>d = {'ASAP':'as soon as possible'}
>>>s = "I will do this ASAP.  Regards, X"
>>>process(s,d)
>>>"I will do this as soon as possible.  Regards, X"

我尝试使用 split 函数来分隔字符串并将每个部分与字典进行比较。

def process(s):
    return ''.join(d[ch] if ch in d else ch for ch in s)

但是，它会返回相同的字符串。我怀疑代码不起作用，因为原始字符串中 ASAP 后面的句号。如果是这样，我如何忽略标点符号并尽快更换？

score 5 · Accepted Answer

这是一种使用单个正则表达式的方法：

In [24]: d = {'ASAP':'as soon as possible', 'AFAIK': 'as far as I know'}

In [25]: s = 'I will do this ASAP, AFAIK.  Regards, X'

In [26]: re.sub(r'\b' + '|'.join(d.keys()) + r'\b', lambda m: d[m.group(0)], s)
Out[26]: 'I will do this as soon as possible, as far as I know.  Regards, X'

与基于的版本不同str.replace()，这会观察单词边界，因此不会替换碰巧出现在其他单词中间的缩写（例如，“fetch”中的“etc”）。

此外，与迄今为止提出的大多数（全部？）其他解决方案不同，它只迭代输入字符串一次，而不管字典中有多少搜索词。

score 2 · Accepted Answer

你可以这样做：

def process(s,d):
    for key in d:
        s = s.replace(key,d[key])
    return s

score 2 · Accepted Answer

这是一个可行的解决方案：使用re.split()，并按单词边界拆分（保留间隙字符）：

''.join( d.get( word, word ) for word in re.split( '(\W+)', s ) )

此代码与 Vaughn 或 Sheena 的答案的一个显着区别是，此代码利用了字典的 O(1) 查找时间，而他们的解决方案则查看字典中的每个键。这意味着当s时间很短且d很大时，他们的代码将需要更长的时间才能运行。此外，部分单词仍将在其解决方案中被替换：如果d = { "lol": "laugh out loud" }和s="lollipop"他们的解决方案将错误地产生"laugh out loudlipop".

score 1 · Accepted Answer

使用正则表达式：

re.sub(pattern,replacement,s)

在您的应用程序中：

ret = s
for key in d:
    ret = re.sub(r'\b'+key+r'\b',d[key],ret)
return ret

\b 匹配单词的开头或结尾。感谢保罗的评论

score 0 · Accepted Answer

不要按空格分割，而是使用：

split("\W")

它将被任何不是单词一部分的字符分割。

score 0 · Accepted Answer

0

    python 3.2

    [s.replace(i,v) for i,v in d.items()]

于 2012-12-11T16:14:36.277 回答

score 0 · Accepted Answer

这也是字符串替换（+1 到@VaughnCato）。这使用该reduce函数遍历您的字典，用值替换字符串中键的任何实例。s在这种情况下是累加器，它在每次迭代时都会减少（即馈送到替换函数），保持所有过去的替换（同样，根据上面@PaulMcGuire 的观点，这会替换以最长开始并以最短结束的键）。

In [1]: d = {'ASAP':'as soon as possible', 'AFAIK': 'as far as I know'}

In [2]: s = 'I will do this ASAP, AFAIK.  Regards, X'

In [3]: reduce(lambda x, y: x.replace(y, d[y]), sorted(d, key=lambda i: len(i), reverse=True), s)
Out[3]: 'I will do this as soon as possible, as far as I know.  Regards, X'

至于为什么你的函数没有返回你所期望的——当你迭代时s，你实际上是在迭代字符串的字符——而不是单词。您的版本可以通过迭代来调整s.split()（这将是一个单词列表），但是您会遇到标点符号导致单词与您的字典不匹配的问题。string您可以通过导入并从每个单词中删除来使其匹配string.punctuation，但这将从最终字符串中删除标点符号（因此，如果替换不起作用，则正则表达式可能是最佳选择）。

python - 用字典替换字符串，用标点符号替换

7 回答 7

Related

Reference