c++ - 无序集哈希表中元素的排序

Question

我对无序容器中的哈希表感到困惑……或者至少在集合中。

所以，我认为它会像这样工作：

我散列我的对象。我计算对象和向量长度的模数（hash%vectorlength）并将其用作哈希表中指向我的对象的指针的索引。据我所知，这是一个向量...

所以对于一个简单的散列函数，它只返回一个 int 包装器的 int 成员的值，它看起来像这样：

hash table:

vector index:           [0, 1, 2, 3, 4]
                         |  |  |  |  |
object with value...:    0  1  2  3  4

我写了一个程序来测试：

#include <iostream>
#include <unordered_set>

struct Obj
{

public:

    Obj(int i)
    {
        mem = i;
    }

    friend bool operator==(const Obj& o1, const Obj& o2)
    {
        return (o1.mem == o2.mem);
    }

    friend std::ostream& operator<<(std::ostream& o, const Obj& obj)
    {
        o << obj.mem;
        return o;
    }

    int mem;

};

namespace std
{
    template<>
    struct hash<Obj>
    {
        size_t operator()(const Obj& r) const
        {
            size_t hash = r.mem;
            return hash;
        }
    };
}


int main()
{
    std::unordered_set<Obj> map;
    for (int i = 0; i < 5; ++i)
    {
        map.insert(Obj(i));
    }

    for(auto it = map.begin(); it != map.end(); ++it)
    {
        std::cout << (*it) << std::endl;
    }
}

我期望输出

但我得到了：

这是为什么？

score 1 · Accepted Answer

您期望unordered容器有订单。它没有任何指定或保证的顺序。正如您所发现的，您的实现利用了它的自由度并实现了您所描述的幼稚哈希表设计之外的其他东西。另一个实现可能会做其他事情。你根本不能依赖它。

score 1 · Accepted Answer

虽然标准库实现确实可以为所欲为，但看看您的假设（以及在几个评论中表达的假设）与实际实现相对应的位置也很有趣。

我可以用 GCC 重现你的非“0 1 2 3 4”结果，尽管只能通过添加一个map.reserve(6)或更多（奇怪的是，5 产生了“4 0 1 2 3”）。

下面的细节简单解释了我查看的 GCC 版本的行为......

为了解释，我首先检查逻辑桶是否包含哈希函数隐含的内容：

for (size_t i = 0; i < map.bucket_count(); ++i)
{
    std::cout << '[' << i << ']';
    for (auto it = map.begin(i); it != map.end(i); ++it)
        std::cout << ' ' << *it;
    std::cout << '\n';
}

而且，他们确实做到了：

[0] 0
[1] 1
[2] 2
[3] 3
[4] 4
[5]
[6]

因此，建议“标准库可以在散列函数之上自由应用任何可逆函数，并且不保证给出任何关于排序的保证”的评论- 虽然是真的 - 不是这里发生的事情。

深入研究标准库标头，我在bits/hashtable.h的文档中找到了原因：

* Here's _Hashtable data structure, each _Hashtable has:
* - _Bucket[]       _M_buckets
* - _Hash_node_base _M_before_begin
* - size_type       _M_bucket_count
* - size_type       _M_element_count
*
* with _Bucket being _Hash_node* and _Hash_node constaining:
* - _Hash_node*   _M_next
* - Tp            _M_value
* - size_t        _M_code if cache_hash_code is true
*
* In terms of Standard containers the hastable is like the aggregation  of:
* - std::forward_list<_Node> containing the elements
* - std::vector<std::forward_list<_Node>::iterator> representing the  buckets
*
* The non-empty buckets contain the node before the first bucket node. This
* design allow to implement something like a std::forward_list::insert_after
* on container insertion and std::forward_list::erase_after on container
* erase calls. _M_before_begin is equivalent to
* std::foward_list::before_begin. Empty buckets are containing nullptr.
* Note that one of the non-empty bucket contains &_M_before_begin which  is
* not a derefenrenceable node so the node pointers in buckets shall never be
* derefenrenced, only its next node can be.
*
* Walk through a bucket nodes require a check on the hash code to see if the
* node is still in the bucket. Such a design impose a quite efficient hash
* functor and is one of the reasons it is highly advise to set
* __cache_hash_code to true.
*
* The container iterators are simply built from nodes. This way  incrementing
* the iterator is perfectly efficient independent of how many empty buckets
* there are in the container.
*
* On insert we compute element hash code and thanks to it find the bucket
* index. If the element must be inserted on an empty bucket we add it at the
* beginning of the singly linked list and make the bucket point to
* _M_before_begin. The bucket that used to point to _M_before_begin, if any,
* is updated to point to its new before begin node.

因此，哈希表基础是用单链表中的值unordered_set组织的，并且桶将迭代器的向量放入该列表中，而不是通常设想的.vector<forward_list<>>

当您插入元素时，它们将进入 front 的前向列表，并且它是您在从begin()toend()，而不涉及vector其排序对应于哈希值的迭代器。

此处的代码说明了迭代如何以相反的插入顺序返回值，而不考虑散列/冲突 -只要预先有足够的空间reserve()以避免重新散列。

c++ - 无序集哈希表中元素的排序

2 回答 2

Related

Reference