python - 在 Python 中检查数据集中的信息

Question

我目前需要使用 Python 2.7 比较包含 MAC 地址的字符串（例如“11:22:33:AA:BB:CC”。目前，我有一个包含 MAC 地址的预配置集，我的脚本会迭代通过集合将每个新的 MAC 地址与列表中的地址进行比较。这很有效，但是随着集合的增长，脚本会大大减慢。只有 100 个左右，您会注意到巨大的差异。

有人对加快这个过程有什么建议吗？将它们存储在一组中是比较的最佳方式，还是将它们存储在 CSV / DB 中更好？

代码示例...

def Detect(p): 
    stamgmtstypes = (0,2,4)
    if p.haslayer(Dot11):
        if p.type == 0 and p.subtype in stamgmtstypes:
            if p.addr2 not in observedclients: 
                # This is the set with location_mutex: 
                detection = p.addr2 + "\t" + str(datetime.now())
                print type(p.addr2)
                print detection, last_location
                observedclients.append(p.addr2)

score 1 · Accepted Answer

首先，您需要分析您的代码以了解瓶颈的确切位置......

此外，作为一般建议，请考虑使用 psyco，尽管有时 psyco 无济于事

一旦找到瓶颈，cython可能会很有用，但您需要确保在 cython 源中声明了所有变量。

score 0 · Accepted Answer

尝试使用set. 声明 set use set()，而不是[]（因为后者声明了一个 empty list）。

中的查找list很O(n)复杂。n当列表增长（复杂性随着as的增长而增长）时，这就是你的情况O(n)。

平均而言，查找set是复杂的。O(1)

http://wiki.python.org/moin/TimeComplexity

此外，您将需要更改代码的某些部分。中没有append方法set，因此您需要使用类似observedclients.add(address).

score 0 · Accepted Answer

该帖子提到“脚本遍历集合，将每个新的 MAC 地址与列表中的地址进行比较。”

要充分利用集合，请不要循环遍历它们进行一对一的比较。而是使用诸如union()、intersection()和difference()之类的集合操作：

s = set(list_of_strings_containing_mac_addresses)
t = set(preconfigured_set_of_mac_addresses)
print s - t, 'addresses in the list but not preconfigured'

python - 在 Python 中检查数据集中的信息

3 回答 3

Related

Reference