0

我有以下 csv 文件:

name, sector, year, region, number

bob,,1999,AS,2

bob,hi-tech,,,3

mike,,2001,NE,2

plan,pharma,,,1

我编写了一个脚本,该脚本查找“名称”对于一行及其下方的行相同的每个实例(csv 文件已按“名称”值排序)。我当前脚本的输出如下:

name, sector, year, region, number

bob,tennis,1999,AS,2+3

bob,tennis,,,3

mike,,2001,NE,2

plan, baseball,,,1

这几乎就是我想要的。我当前脚本的优点在于它识别了“name”值相同的每个实例,然后将两行的所有属性与该名称组合在一起,并更新“number”列。我的脚本的问题是,一旦创建新行,就应该删除进入合并的两行。在上面的例子中,第二行:

bob,tennis,,,3

不应该在这里。我已经在下面复制了我的实际脚本的相关部分,并且非常感谢任何人可以提供的任何澄清。

for next_row in reader:
        first_name = first_row['name']
        next_name = next_row['name']

        if first_name == next_name:
            if first_row['source'] == '2':
                #get relevant attributes from next_row and add them to first_row

                first_row['number'] = first_row['number'] + ' + ' + next_row['number']
            elif next_row['number'] == '2':
                #get relevant attributes from next_row and add them to first_row

                first_row['number'] = first_row['number'] + ' + ' + next_row['number']
            writer.writerow(first_row)
            first_row = next_row
        else:
            writer.writerow(first_row)

            first_row = next_row
4

1 回答 1

1

正如评论中所建议的那样,您可能希望使用迭代器来reader. 如果readernext方法,你就可以;否则,您可以使用reader=iter(reader).

首先,定义你的first_row: 你可以简单地做first_row = reader.next().

然后,只需一个接一个地尝试一个条目:您将编写您的行并first_row仅在它不再等于时更新您的行next_row

一旦迭代器被完全消耗掉,StopIteration就会引发 a 。你只需要写最后一个first_row

try:
    while True:
        next_row = reader.next()
        if first_row["name"] == next_row["name"]:
            ...do_something...
        else:
            writer.writerow(first_row)
            first_row = next_row
except StopIteration:
    writer.writerow(first_row)
于 2012-08-28T21:02:58.883 回答