由于在同一个单元格中连接(该部分不受我控制),我有一个包含许多重复单词的 csv 文件。这通常发生在同一列中。这是一个像我一样的例子:
Name,Geo Location,Default
DRE EXT EXT Pair Video,,
DRE United Kingdom EXT LON Extrane lo.EXT LON RD01,United Kingdom,
DRE United Kingdom EXT LON Extrane lo.EXT LON RD02,United Kingdom,
DRE United Kingdom LON lab dyna test LON,United Kingdom,
DRE United StatesCPT Corp Point Link_Pair Video DRE,United States,
DRE United Kingdom SDD SASD-D TRAIL01 to RD01,United Kingdom,
DRE United Kingdom SDD SASD-D TRAIL01 to RD02 SASD-D,United Kingdom,
DRE United Kingdom SDD SASD-D TRAIL02 to RD01,United Kingdom,
DRE United Kingdom SDD SASD-D TRAIL02 to RD02,United Kingdom,
DRE United Kingdom SDD SASD-D TRAIL01 to TRAIL02,United Kingdom,
DRE United Kingdom SDD SASD-D RD01 to RD02,United Kingdom,
DRE United States MDR SASD-D XC SASD-D Xplay to SASD-D,United States,
DRE Hong Kong (China) Hongkong HKOuter RD01 HKInter,"Hong Kong, Hong Kong",
DRE United Kingdom DRE LON Sq lab dynam test,United Kingdom,
DRE United States USTHA SPS Thalberg usthamd mdf01,United States,
DRE Hong Kong (China)DRE SASD-D Hong Kong Citi SASD-D EXT,Hong Kong,
SASD-D United States SASD-D USPHXCAP VRF SASD-D USPHXCAP RD02,United States,
我需要删除重复的单词,但只能在同一个单元格中。
我从下面的代码开始,它基于这里关于类似主题的许多其他问题/答案。我的代码不工作,我不知道该怎么做才能让它工作,或者是否有另一种更好的方法。
from csv import DictReader, DictWriter
with open('file1.csv') as fi1,\
open('file2.csv', 'wb') as fout1:
read1 = DictReader(fi1)
write1 = DictWriter(fout1, fieldnames=read1.fieldnames)
write1.writeheader()
for line1 in read1:
col=line1['Name']
outline = dict(line1)
' '.join(set(col.split()))
write1.writerow(outline)
我需要帮助来完成这项工作或使用其他方法来使其工作。我在想如果有一种方法可以清除行之间的集合,它可能会起作用。
谢谢, B0T