3

我正在尝试从元组列表中构建列联表。该列表如下所示:

lst = [('a', 'bag'), ('a', 'bag'), ('a', 'bag'), ('a', 'cat'), ('a', 'pen'), ('that', 'house'), ('my', 'car'), ('that', 'bag'), ('this', 'bag')]

给定一个元组,比如说('a', 'bag'),必须解决 4 件事:

a = lst.count(('a', 'bag'))这是3

b是所有元组的计数tuple[0] == 'a' and tuple[1] != 'bag',它是 2: ('a', 'cat'), ('a', 'pen')

当我尝试

lst.count(('a', not 'bag'))我明白0了,虽然它应该是2-----1

c是所有元组的计数,其中tuple[0] != 'a' and tuple[1] == 'bag'. 在这种情况下,('that', 'bag'), ('this', 'bag')。但是当我尝试

lst.count((not 'a', 'bag'))我明白0了,虽然它应该是2-----2

d是所有元组的计数,其中tuple[0] !== 'a' and tuple[1] != 'bag和 可以很容易地从中获得len(lst) - a

我的问题:有没有办法notlst.count((x, not y))or中组合逻辑门lst.count((not x, y))?如果没有,您能否向我建议如何在没有循环的情况下进行锻炼bc因为复杂性2(N*N)非常昂贵。

非常感谢您的帮助!

4

3 回答 3

1

你不能notcount这种方式使用。如果这样做lst.count(('a', not 'bag')),则将not 'bag'首先评估False,因此您实际上是在计数('a', False)

相反,您可以使用sum条件,比较元组的第一个和第二个元素:

>>> lst = [('a', 'bag'), ('a', 'bag'), ('a', 'bag'), ('a', 'cat'), ('a', 'pen'), ('that', 'house'), ('my', 'car'), ('that', 'bag'), ('this', 'bag')]
>>> lst.count(('a', 'bag'))
3
>>> sum(1 for a,b in lst if a == 'a' and b == 'bag')
3
>>> sum(1 for a,b in lst if a == 'a' and b != 'bag')
2
>>> sum(1 for a,b in lst if a != 'a' and b == 'bag')
2
于 2016-03-18T20:58:27.293 回答
1
from collections import Counter, defaultdict

lst = [('a', 'bag'), ('a', 'bag'), ('a', 'bag'), ('a', 'cat'), ('a', 'pen'), ('that', 'house'), ('my', 'car'), ('that', 'bag'), ('this', 'bag')]
# counting edges in 2 directed graphs
dct_a = defaultdict(Counter)
dct_b = defaultdict(Counter)

for a, b in lst:
    # dct_x[x][0] represents total count of occurrences of x in first position.
    dct_a[a][b] += 1
    dct_a[a][0] += 1

    dct_b[b][a] += 1
    dct_b[b][0] += 1

def compute_coocurrence(a, b):
    out = {}
    out['both_occur']  = dct_a[a][b]
    out['a_but_not_b'] = dct_a[a][0] - dct_a[a][b]
    out['b_but_not_a'] = dct_b[b][0] - dct_b[b][a]
    return out

print compute_coocurrence('a', 'bag')

Pythoncollections提供了 2 个很好的数据结构,可以帮助您解决问题。这种方法构造了 2 个字典,它们分别由元组中的第一个和第二个索引索引。所以dct_a['a']保存了所有 b 的共现(与 a)的计数。我相信这表明了一个 O(n) 两遍算法。

{'both_occur': 3, 'b_but_not_a': 2, 'a_but_not_b': 2}

于 2016-03-18T21:07:07.293 回答
0

您可以定义一个函数,一次计算所有 4 种组合,如下所示

>>> def my_count(iterable,a,b):
        both    = 0
        a_not_b = 0
        not_a_b = 0
        neither = 0 
        for x,y in iterable:
            if x == a and y == b:
                both += 1
            if x == a and y!= b:
                a_not_b += 1
            if x != a and y == b:
                not_a_b += 1
            if x!= a and y!= b:
                neither += 1
        return both, a_not_b, not_a_b, neither

>>> lst = [('a', 'bag'), ('a', 'bag'), ('a', 'bag'), ('a', 'cat'), ('a', 'pen'), ('that', 'house'), ('my', 'car'), ('that', 'bag'), ('this', 'bag')]
>>> my_count(lst,"a","bag")
(3, 2, 2, 2)
>>>         

并使其更详细,您可以添加这样的名称元组

>>> from collections import namedtuple
>>> CountTuple = namedtuple("CountTuple","both a_not_b not_a_b neither")
>>> def my_count(iterable,a,b):
        #same as before 
        ...
        return CountTuple(both,a_not_b,not_a_b,neither)

>>> result = my_count(lst,"a","bag")
>>> result
CountTuple(both=3, a_not_b=2, not_a_b=2, neither=2)
>>> result.both
3
>>> result.a_not_b
2
>>> result.not_a_b
2
>>> result.neither
2
>>>     
于 2016-03-18T21:37:16.553 回答