python - 稍微烦躁的数据是相等的

Question

我正在逐行阅读 csv。CSV 如下所示：

当两个数字完全相同时，我想稍微改变一个。

例如，有两个值是29.781646，我想将一个更改为29.781645

如果 csv 包含：

29.781646
29.781646
29.781646

然后我想将其更改为：

29.781646
29.781645
29.781644

我非常感谢您的指导，以有效地实施这一点。

请注意，我想以倍数执行此操作0.000001

score 6 · Accepted Answer

您可以逐行遍历文件，并使用 a 跟踪看到set的值，如果它已经在set.

粗略的例子：

seen = set()
with open('test.csv') as input, open('test_out.csv', 'w') as output:
    for line in input:
        value = float(line)
        while value in seen:
            value -= 0.000001
        seen.add(value)
        output.write(str(value) + '\n')

这是有效的，因为 aset提供 O(1) 查找

如果您要将值写回同一个文件，则可以使用 fileinput 模块：

import fileinput

seen = set()
for line in fileinput.FileInput('test.csv', inplace=True):
    value = float(line)
    while value in seen:
        value -= 0.000001
    seen.add(value)
    print str(value).strip()

编辑

为了解决 eumiro 关于浮点问题的评论：

您可以使用decimal模块Decimal或只是将值与/除相乘/除以使用1000000，int而不是使用float. 正如我所写，这只是一个粗略的例子:-)

score 1 · Accepted Answer

>>> s = """29.781646
29.781646
42.698079
43.346914
44.369203
45.006459
45.006459
39.316758
"""
>>> d = {}
>>> for nb in [float(l) for l in s.split('\n') if l]:
    # Create a dict of repetitions, to decrease by the number of times already seen
    if nb not in d:
        d[nb] = 0
        print nb
    else:
        rep = d[nb]
        d[nb] = rep + 1
        print nb - d[nb] * 0.000001


29.781646
29.781645
42.698079
43.346914
44.369203
45.006459
45.006458
39.316758
>>>

score 1 · Accepted Answer

基于 BigYellowCactus 但解决了 eumiro 关于错误如何在多次修改的浮点数中累积的评论：

seen = set() with open('test.csv') as input, open('test_out.csv', 'w') as output:
    for line in input:
        value = float(line)
        modifier = 1

        while True
            new_value = value - (modifier * 0.000001)
            modifier += 1
            if not new_value in seen:
                break

        seen.add(new_value)
        output.write(str(value) + '\n')

python - 稍微烦躁的数据是相等的

3 回答 3

Related

Reference