0

当我运行此代码来编辑我的 CSV 文件时,即使我的字典中有字符串,也只有部分字符串被替换。

import re

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

bottle = "vial jug canteen urn jug33"
transport = "car automobile airplane scooter"

mydict = {}
for word in bottle.split():
    mydict[word] = 'bottle'
for word in transport.split():
    mydict[word] = 'transport'
print(mydict) # test


with open('replacesample.csv','r') as f:
    text=f.read()
    text=replace_all(text,mydict)
    text=re.sub(r'PROD\s(?=[1-9])',r'PROD',text)

with open('file2.csv','w') as w:
    w.write(text)

例如,如果我的字符串 CSV 是这样的:

jug 
canteen 
urn
car
automobile
swag
airplane
jug33

我的最终结果是:

bottle 
bottle 
bottle
transport
transport
swag
transport
bottle33

我该如何解决?

预期的:

bottle 
bottle 
bottle
transport
transport
swag
transport
bottle
4

1 回答 1

0

您正在使用字典来枚举替换模式。字典以任意顺序返回键和值。

因此,- jug>bottle替换发生jug33->bottle替换之前。这种替换也适用于部分单词。

解决方案是按长度相反的顺序对键进行排序,以确保首先替换较长的匹配项:

def replace_all(text, dic):
    for i, j in sorted(dic.iteritems(), key=lambda i: len(i[0]), reverse=True):
        text = text.replace(i, j)
    return text

演示:

>>> def replace_all(text, dic):
...     for i, j in dic.iteritems():
...         text = text.replace(i, j)
...     return text
... 
>>> replace_all('jug33 jug', mydict)
'bottle33 bottle'
>>> def replace_all(text, dic):
...     for i, j in sorted(dic.iteritems(), key=lambda i: len(i[0]), reverse=True):
...         text = text.replace(i, j)
...     return text
... 
>>> replace_all('jug33 jug', mydict)
'bottle bottle'
于 2013-08-30T21:49:37.103 回答