2

我正在逐行阅读 csv。CSV 如下所示:

29.781646
29.781646
42.698079
43.346914
44.369203
45.006459
45.006459
39.316758

当两个数字完全相同时,我想稍微改变一个。

例如,有两个值是29.781646,我想将一个更改为29.781645

如果 csv 包含:

29.781646
29.781646
29.781646

然后我想将其更改为:

29.781646
29.781645
29.781644

我非常感谢您的指导,以有效地实施这一点。

请注意,我想以倍数执行此操作0.000001

4

3 回答 3

6

您可以逐行遍历文件,并使用 a 跟踪看到set的值,如果它已经在set.

粗略的例子:

seen = set()
with open('test.csv') as input, open('test_out.csv', 'w') as output:
    for line in input:
        value = float(line)
        while value in seen:
            value -= 0.000001
        seen.add(value)
        output.write(str(value) + '\n')

这是有效的,因为 aset提供 O(1) 查找


如果您要将值写回同一个文件,则可以使用 fileinput 模块:

import fileinput

seen = set()
for line in fileinput.FileInput('test.csv', inplace=True):
    value = float(line)
    while value in seen:
        value -= 0.000001
    seen.add(value)
    print str(value).strip()

编辑

为了解决 eumiro 关于浮点问题的评论:

您可以使用decimal模块Decimal或只是将值与/除相乘/除以使用1000000int而不是使用float. 正如我所写,这只是一个粗略的例子:-)

于 2012-08-29T08:09:08.653 回答
1
>>> s = """29.781646
29.781646
42.698079
43.346914
44.369203
45.006459
45.006459
39.316758
"""
>>> d = {}
>>> for nb in [float(l) for l in s.split('\n') if l]:
    # Create a dict of repetitions, to decrease by the number of times already seen
    if nb not in d:
        d[nb] = 0
        print nb
    else:
        rep = d[nb]
        d[nb] = rep + 1
        print nb - d[nb] * 0.000001


29.781646
29.781645
42.698079
43.346914
44.369203
45.006459
45.006458
39.316758
>>> 
于 2012-08-29T08:15:52.723 回答
1

基于 BigYellowCactus 但解决了 eumiro 关于错误如何在多次修改的浮点数中累积的评论:

seen = set() with open('test.csv') as input, open('test_out.csv', 'w') as output:
    for line in input:
        value = float(line)
        modifier = 1

        while True
            new_value = value - (modifier * 0.000001)
            modifier += 1
            if not new_value in seen:
                break

        seen.add(new_value)
        output.write(str(value) + '\n')
于 2012-08-29T08:25:58.180 回答