0

我有一个将不断更新的主文件和一个每分钟创建的文件。我希望能够将每分钟创建的新文件与现有的主文件进行比较。到目前为止,我有:

with open("jobs") as a:
   new = a.readlines()

count=0
for item in new:
   new[count]=new[count].split(",")
   count+=1

这将允许我比较我的主文件中每一行的第一个索引([0]。现在我开始困惑自己。我猜它会是这样的:

counter=0
for item in new:
    if new[counter][0] not in master:
        end = open("end","a")
        end.write(str(new[counter]) + "\n")
        counter+=1
        end.close()
    else:
         REPLACE LINES THAT ALREADY EXIST IN MASTER FILE WITH NEW LINE

每次新文件进入时,ID 的顺序不一定相同,并且新文件在某些​​时候可能包含比主文件更多的条目。

如果我没有理解或遗漏了一些信息,请告诉我,我会尽力澄清。谢谢。

4

2 回答 2

1

听起来csv对我来说是个问题。

不幸的是,您的问题并不清楚,如果您想修改主文件本身、输出文件或两者兼而有之。这是第二个(它需要一个主文件和一个更新文件,都是 csv 格式,并将未排序的合并内容打印到一个输出文件)。如果这不是您想要的,或者如果您使用逗号分隔数据,但顶部没有字段名,那么根据需要进行更改应该很容易。

import csv
with open("master.csv") as m, open("update.csv") as u, open("out.csv", "w") as o:
    master = { line['ID']: line for line in csv.DictReader(m) }
    update = { line['ID']: line for line in csv.DictReader(u) }
    master.update(update)
    fields = csv.DictReader(open("master.csv")).fieldnames
    out = csv.DictWriter(o, fields)
    out.writeheader()
    out.writerows(master.values())

与 master.csv 一样:

ID,Name,Foo,Bar,Baz,Description
1000001,Name here:1,1001,1,description here
1000002,Name here:2,1002,2,description here
1000003,Name here:3,1003,3,description here
1000004,Name here:4,1004,4,description here
1000005,Name here:5,1005,5,description here
1000006,Name here:6,1006,6,description here
1000007,Name here:7,1007,7,description here
1000008,Name here:8,1008,8,description here
1000009,Name here:9,1009,9,description here

和 update.csv 这样:

ID,Name,Foo,Bar,Baz,Description
1000003,UPDATED Name here:3,1003,3, UPDATED description here
1000010,NEW ITEM Name here:9,1009,9,NEW ITEM description here 

它输出到 out.csv:

ID,Name,Foo,Bar,Baz,Description
1000010,NEW ITEM Name here:9,1009,9,NEW ITEM description here ,
1000008,Name here:8,1008,8,description here,
1000009,Name here:9,1009,9,description here,
1000006,Name here:6,1006,6,description here,
1000007,Name here:7,1007,7,description here,
1000004,Name here:4,1004,4,description here,
1000005,Name here:5,1005,5,description here,
1000002,Name here:2,1002,2,description here,
1000003,UPDATED Name here:3,1003,3, UPDATED description here,
1000001,Name here:1,1001,1,description here,

请注意,订单不会保留(如有必要,从问题中不清楚)。但它又快又干净。

于 2012-04-25T17:11:36.743 回答
0

也许这样的事情会起作用:

#First create a set of all the ids contained in a masterfile
master_set = set()
with open('masterfile.txt') as mf:

    for ele in mf:
        master_set.add(ele.split(',')[0])

#if id is not in masterfile (or set) append the line to masterfile
with open('tempfile.txt') as temp, open('masterfile.txt', 'a') as mf:
    for line in temp:
        index = line.split(',')[0]
        if not index in master_set:
            master_set.add(index)
            mf.write(line)

我还没有测试过。

于 2012-04-25T16:19:16.787 回答