为什么 C++ 集合实现为二叉树而不是哈希集,与二叉树提供的 O(log n) 相比,它可以提供 O(1) 的平均案例复杂度?
3 回答
因为 C++ 集合由T
的比较运算符排序,这使得以可预测的方式迭代成员成为可能。如果你知道你对集合所做的只是插入、测试成员资格和/或删除元素,那么std::unordered_set
自 C++11 以来就存在实现哈希集的 .
There are two factors here:
- both binary tree and hash based associative sets are useful
- the original STL only implemented the former due to a lack of time to work through the "politics" of Standarisation; SGI, GNU, MS, boost and others have provided non-Standard hash-based versions for many years, and C++11 introduces
unordered_set
These points are discussed below.
BOTH ARE USEFUL
There are lots of pros/cons, and both are available (as of C++11) to let the programmer choose. Hash-based sets were not available because there simply wasn't agreement on the implementation in time for them to be included in earlier standards.
set
iteration is in sort order, whereas a hashed containerunsorted_set
is effectively randomised-order traversalset
supportslower_bound
,upper_bound
, which aren't practical to provide on anunsorted_set
set
uses a comparison function (by default the contained type'soperator<
, whereasunordered_set
requires a hashing function (defaults are provided for some types but they can be pretty mediocre depending on your actual keys, and quality hashing can be time consuming) and a key equality function- either may be faster for smaller values of N - not everyone's dealing with billions of elements so it's sensible to offer choice even from a performance perspective
- there may be significant differences in memory usage for small objects, though I'm not sure what overall guidelines would be true across implementations so can just suggest measuring in your program if you care
- as elements are added, existing iterators into a
std::unordered_set
may be invalidated (see 23.2.5p14 for when exactly), whereas iterators into astd::map
are never invalidated by inserts (pointers and references tounordered_set
elements remain valid though).
Why earlier C++ Standards didn't have a hash based set
From an interview with Stepanov:
Question: I find two hash table implementations in the D.Musser site, and they were both working and quite smart - much smarter than hash tables commonly found in class libraries. Why were hash tables not included into STL?
Answer: Politics. They have to be in. Our new implementation of STL does contain them. In general, we need to develop a mechanism for adding things to STL. After all, STL is an extensible framework, it begs to be extended. There are many data structures that are missing, for example, singly linked lists, matrices, and graphs. SGI is willing to lead the way in extending STL.
(full interview at http://www.stlport.org/resources/StepanovUSA.html)