0

我有以下格式的两个 csv 文件:

第一个是 outputTweetsDate.csv:

Here is some text;13.09.13 16:45
Here is more text;13.09.13 16:45
And yet another text;13.09.13 16:46

第二个文件是 apiSheet.csv:

13.09.13 16:46;89.56
13.09.13 16:45;90.40

我想比较这两个文件,如果两个日期时间值匹配,则将文本和数据添加到一个新文件(finalOutput.csv):

|89.56|,|Here is some text|
|89.56|,|Here is more text|
|90.49|,|And yet another text|

这是我到目前为止的代码:

with open("apiSheet.csv", "U") as in_file1, open("outputTweetsDate.csv", "rb") as in_file2,open("finalOutput.csv", "wb") as out_file:
   reader1 = csv.reader(in_file1,delimiter=';')
   reader2 = csv.reader(in_file2,delimiter='|')
   writer = csv.writer(out_file,delimiter='|')
   for row1 in reader1:
       for row2 in reader2:
           if row1[0] == row2[1]:
               data = [row1[1],row2[0]]
               print data
               writer.writerow(data)

我编辑了我的代码,它现在可以工作了,但它不能正确地遍历我的所有代码。暂时我的输出是这样的:

|89.56|,|Here is some text|
|89.56|,|Here is more text|

因此,即使它们相同,它也不会向我显示第三个。似乎它没有很好地遍历文件。

谢谢!

4

1 回答 1

0

在读取 file1 的第二行之前,您的第二个循环到达 file2 (outputTweetsDate.csv) 的末尾。

试试这个片段:

 with open("apiSheet.csv", "U") as in_file1, open("outputTweetsDate.csv", "rb") as in_file2,open("finalOutput.csv", "wb") as out_file:
   reader1 = csv.reader(in_file1,delimiter=';')
   reader2 = csv.reader(in_file2,delimiter='|')
   writer = csv.writer(out_file,delimiter='|')
   row2 = reader2.next()
   for row1 in reader1:
       while row2 and row1[0] <= row2[1]:
           if row1[0] == row2[1]:
               data = [row1[1],row2[0]]
               print data
               writer.writerow(data)
           row2 = reader2.next()

编辑 逆序很棘手。让我们停止尝试变得聪明,并做一些蛮力。由于文件远小于您的 RAM,它将完美运行。

 with open("apiSheet.csv", "U") as in_file1, open("outputTweetsDate.csv", "rb") as in_file2,open("finalOutput.csv", "wb") as out_file:
   reader1 = csv.reader(in_file1,delimiter=';')
   reader2 = csv.reader(in_file2,delimiter='|')
   writer = csv.writer(out_file,delimiter='|')

   rows2 = [row for row in reader2] # all the content of file2 goes in RAM.
   for row1 in reader1:
       for row2 in rows2:
           if row1[0] == row2[1]:
               data = [row1[1],row2[0]]
               print data
               writer.writerow(data)
于 2013-09-22T17:28:57.913 回答