1

我正在编写一个我将经常使用的脚本,其中包含不同大小的数据集,并且我必须进行一些我无法直接在 Python 中进行的比较。

将有多个列表(大约 20 个或更多,但出于测试目的,我将它们减少到三个),所有列表都以特定顺序具有相同数量的整数项。我想比较每个列表中相同位置的项目以找出差异。对于定义数量的列表,这很容易:

a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

for x,y,z in zip(a,b,c):
    if x != y != z:
        print x, y, z

我已经尝试将该循环包装在一个函数中,因此参数的数量可能会有所不同,但我被卡住了。

def compare(*args):
    for x in zip(args):
        ???

在最终脚本中,我不会有多个单个列表,而是一个列表中的所有列表。那会有帮助吗?如果我遍历列表列表,我不会一次得到每个列表......

忘记 function吧,它无论如何都不是很有用,因为它将成为更大脚本的一部分,并且定义不同的参数太困难了。我现在一次比较两个列表,保存相同的列表。这样,我以后可以轻松地从我的整个列表中删除所有这些,只保留唯一的。

l_o_l = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

for i in range(0, (len(l_o_l)-1)):
    for j in range((i+1), len(l_o_l)):
        if l_o_l[i] == l_o_l[j]:
            duplicates.append(key_list[i])
            duplicates.append(key_list[j])
dup = list(set(duplicates))
uniques = [x for x in key_list if x not in dup]

其中 key_list 包含字典中我的列表的标识符。

有什么改进的建议吗?

4

5 回答 5

3

也许是这样的

def compare(*args):
    for things in zip(*args):
        yield all(x == things[0] for x in things)

然后你可以像这样使用它

a = range(10)
b = range(10)
c = range(10)
d = range(11, 20)

for match in compare(a,b,c):
    print match

for match in compare(a,b,c,d):
    print match

这是一个使用您的示例的演示(它是一个生成器,因此您必须对其进行迭代或使用它耗尽它list

a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

print list(compare(a,b,c))
于 2013-02-07T09:18:15.227 回答
1
def compare(*args):
    for x in zip(args):
        values_list = list(x[0]) # x[0] because x is a tuple
        different_values = set(values_list) # a set does not contain identical values
        if len(different_values) != 1: # if you have more than 1 value you have different values in your list
            print 'different values', values_list

给你

a = [0, 0, 1]
b = [0, 1, 1]
c = [1, 1, 1]
compare(a, b, c)
>>> different values [0, 0, 1]
>>> different values [0, 1, 1]
于 2013-02-07T10:21:12.790 回答
0

假设列表与示例中的列表相似,我将使用:

def compare(*args):
    for x in zip(args):
        if min(x) != max(x):
             print x
于 2013-02-07T10:19:25.400 回答
0
def compare(elements):
    return len(set(elements)) == bool(elements)

如果您想知道所有列表是否相同,您可以简单地执行以下操作:

all(compare(elements) for elements in zip(the_lists))

另一种方法是将lists 转换为tuples 并set在那里使用:

len(set(tuple(the_list) for the_list in the_lists) == bool(the_lists)

如果您只是想删除重复项,这应该更快:

the_lists = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]

示例用法:

>>> a = range(100)
>>> b = range(100, 200)
>>> c = range(200, 300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists2 = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
>>> [a,b,c] == sorted(the_lists2)  #order is not maintained by set
True

它似乎非常快:

>>> timeit.timeit('[list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]', 'from __main__ import the_lists', number=1000000)
7.949447154998779

执行 100 万次不到 8 秒。(哪里the_lists和以前用的一样。)

编辑: 如果您只想删除重复list的,那么我能想到的最简单的算法是对列表列表进行排序并使用itertools.groupby

>>> a = range(100)
>>> b = range(100,200)
>>> c = range(200,300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists.sort()
>>> import itertools as it
>>> for key, group in it.groupby(the_lists):
...     if len(list(group)) == 1:
...             print key
... 
[200, 201, 202, ..., 297, 298, 299]
于 2013-02-07T10:12:31.183 回答
-1

我认为尝试使用 *args 和 zip 变得聪明只会使问题变得混乱。我会这样写:

def compare(list_of_lists):
    # assuming not an empty data set
    inner_len = len(list_of_lists[0])
    for index in range(inner_len):
        expected = list_of_lists[0][index]
        for inner_list in list_of_lists:
            if inner_list[index] != expected:
                # report difference at this index
于 2013-02-07T10:22:25.867 回答