performance - efficient way to find set membership

Question

I am working with 2^n vector e.g. n=3 the possible values are:

000, 001, 010, 011, 100, 101, 110, 111

I would like to find what is the most efficient way, given the set of combinations say

000, 000, 001, 100, 000, 110, 000, 110

how to find if a given value is in the possible set.

One way would be to go through the entire list (brute force). Another would be to use any of the classic search methods e.g. binary search etc for log_2(n) +1

Another method would be to use Bloom filters, although this is a probabilistic method

I want to know if there's anything else out there, that given a list of bit strings, to efficiently test for its membership.

score 0 · Accepted Answer

任何数据结构都可以工作。无论您的本地字典结构是什么，我都会使用它，因为这很容易做到并且是经过良好测试的代码。通常这是一个散列，尽管它通常被称为字典、HashMap 或 std::unordered_map 之类的东西。有时它是一棵二叉树。哈希 (Perl)、字典 (Python)、HashMap。

如果我要为这个问题推出一个“完美的数据结构”，我可能希望在 trie 上有一些变体。但是最大的胜利是一个相当小的因素加速，所以除非我知道它是必要的，否则为什么要打扰呢？

score 0 · Accepted Answer

某种基于散列的集合（HashSet例如 Java 中的 a）将在摊销的常数时间内进行插入和查找，这是您将在渐近术语中获得的最佳结果。

如果你真的想把船推出去，并且集合将是密集的（即，可能存在相当一部分可能的位串），那么将它们转换为整数并使用位域。这也是常数时间，但更快的常数。

performance - efficient way to find set membership

2 回答 2

Related

Reference