我有任意数量的 Python 集,例如

>>> a = {1, 2, 3}
>>> b = {3, 4, 5}
>>> c = {5, 6, 7}
>>> d = {7, 8, 1}


>>> a ^ b ^ c ^ d
{2, 4, 6, 8}


>>> l = [a, b, c, d]
>>> l
[{1, 2, 3}, {3, 4, 5}, {5, 6, 7}, {1, 7, 8}]


>>> res = l[0].copy()
>>> for item in l[1:]:
...     res.symmetric_difference_update(item)
>>> res
{2, 4, 6, 8}

我想知道是否有更有效的方法,理想情况下无需通过 Python for 循环。集合操作在 Python 中实际上非常快,但我的列表可能会变得相当长,因此具有讽刺意味的是 for 循环本身成为了瓶颈。

编辑 (1)


编辑 (2)


from typing import List, Set
from functools import reduce
from collections import defaultdict

length = 1_000
data = [
    {idx - 1, idx, idx + 1}
    for idx in range(3_000, 3_000 + length * 2, 2)

def test_loop1(l: List[Set[int]]) -> Set[int]:
    res = l[0].copy()
    for item in l[1:]:
    assert len(res) == len(l) + 2
    return res

test_loop1: 121 µs ± 321 ns

def test_loop2(l: List[Set[int]]) -> Set[int]:
    res = set()
    for item in l:
    assert len(res) == len(l) + 2
    return res

test_loop2: 112 µs ± 3.16 µs

def test_reduce1(l: List[Set[int]]) -> Set[int]:
    res = reduce(Set.symmetric_difference, l)
    assert len(res) == len(l) + 2
    return res

test_reduce1: 9.89 毫秒 ± 20.6 微秒

def test_dict1(l: List[Set[int]]) -> Set[int]:
    A general solution allowing for entries to occur more than twice in the input data
    d = defaultdict(int)
    for item in l:
        for entry in item:
            d[entry] += 1
    res = {entry for item in l for entry in item if d[entry] == 1}
    assert len(res) == len(l) + 2
    return res

test_dict1: 695 µs ± 5.11 µs


0 回答 0