2

我有两个这样的 csv 文件

"id","h1","h2","h3", ...
"1","blah","blahla"
"4","bleh","bleah"

我想合并这两个文件,这样如果两个文件中的 id 相同,则该行的值应该来自第二个文件。如果它们具有不同的 id,则合并后的文件应包含两行。


一些值有昏迷

"54","34,2,3","blah"
4

3 回答 3

3
res = {}

a=open('a.csv')
for line in a:
    (id, rest) = line.split(',', 1)
    res[id] = rest
a.close()

b=open('b.csv')
for line in b:
    (id, rest) = line.split(',', 1)
    res[id] = rest
b.close()

c=open('c.csv', 'w')
for id, rest in res.items():
    f.write(id+","+rest)
f.close()

基本上,您使用每行的第一列作为字典中的键res。因为 b.csv 是第二个文件,所以第一个文件 (a.csv) 中已经存在的键将被覆盖。最后,您在输出文件 c.csv 中再次合并key并在一起。rest

标题行也将从第二个文件中获取,但我猜这些应该没有什么不同。

编辑:一种稍微不同的解决方案,它合并任意数量的文件并按顺序输出行:

res = {}
files_to_merge = ['a.csv', 'b.csv']
for filename in files_to_merge:
    f=open(filename)
    for line in f:
        (id, rest) = line.split(',', 1)
        if rest[-1] != '\n': #last line may be missing a newline
            rest = rest + '\n'
        res[id] = rest
    f.close()

f=open('c.csv', 'w')
f.write("\"id\","+res["\"id\""])
del res["\"id\""]
for id, rest in sorted(res.iteritems()):
    f.write(id+","+rest)
f.close()
于 2013-05-15T11:06:34.960 回答
2

保持键顺序,并基于 维护最后一行id,您可以执行以下操作:

import csv
from collections import OrderedDict
from itertools import chain

incsv = [csv.DictReader(open(fname)) for fname in ('/home/jon/tmp/test1.txt', '/home/jon/tmp/test2.txt')]
rows = OrderedDict((row['id'], row) for row in chain.from_iterable(incsv))
for row in rows.itervalues(): # write out to new file or whatever here instead
    print row
于 2013-05-15T11:15:24.143 回答
1

Python3

import csv

with open("a.csv") as a:
    fields = next(a)
    D = {k: v for k,*v in csv.reader(a)}

with open("b.csv") as b:
    next(b)
    D.update({k: v for k,*v in csv.reader(b)})

with open("c.csv", "w") as c:
    c.write(fields)
    csv.writer(c, quoting=csv.QUOTE_ALL).writerows([k]+v for k,v in D.items())
于 2013-05-15T11:17:48.580 回答