c++ - 是否有连续存储的哈希映射数据结构？

Question

想想不同类型的集合，如Position, Color, Name. 可以使用集合中的相同键连接这些实例。密钥是 64 位长度的全局唯一标识符。目前，我使用哈希映射，但这并不理想。

// custom types
struct Position { float x, y, z; bool static; };
enum Color { RED, BLUE, GREEN };

// collections
std::unordered_map<uint64_t, Position> positions;
std::unordered_map<uint64_t, Color> colors;
std::unordered_map<uint64_t, std::string> names;

// add some data
// ...

// only access positions collection,
// so the key is not needed here
for (auto i = positions.begin(); i != positions.end(); ++i) {
    if (i->second.static) continue;
    i->second.x = (rand() % 1000) / 1000.f;
    i->second.y = (rand() % 1000) / 1000.f;
    i->second.z = (rand() % 1000) / 1000.f;
}

// access to all three collections,
// so we need the key here
for (auto i = positions.begin(); i != positions.end(); ++i) {
    uint64_t id   = *i->first;
    auto position = i->second;
    auto color    = colors.find(id);
    auto name     = names.find(id);
    if (color == colors.end() || name == names.end()) continue;
    draw(*name, *position, *color);
}

我尝试将集合分开，但正如您所看到的，同时也需要多个集合的收集实例。当然，我也需要不时添加或删除单个元素，但这些情况对性能并不重要。

现在我想优化对单个集合的迭代。因此，我尝试将集合连续存储，这是面向数据设计理念的一部分。但是，我仍然需要非常快速地访问各个实例。直接使用数组是行不通的，因为这会分配太多内存，而且并非一种类型的所有实例都有另一种类型的对应物。另一方面，哈希映射对于迭代来说不是最优的。

我认为数据结构必须在内部使用数组。我应该在这里使用哪种数据类型？C++标准库中是否实现了这样的数据结构？

score 3 · Accepted Answer

创建一个unordered_mapto 偏移到一个std::vector.

Store your elements in the std::vector. When you want to remove an element, swap it with the last element of the std::vector, and delete the last element. Go through the unordered_maps that store indexes into your std::vector and repair those that pointed to the last element to point to where the last element now is.

Removing an element is now O(n). If you remove a bunch of elements at once, you can do all of them in one O(n) pass with a little creativity.

Adding an element remains O(1).

Iterating over all of the elements involves iterating over the std::vector. If you need the hash value while iterating, store it redundantly there, or calculate it on the fly.

Wrap the std::vector and std::unordered_map behind a type that enforces the above rules, as otherwise the invariants will never persist, that exposes a map-like interface externally.

As an alternative to O(n) removal, create a parallel std::vector storing std::unordered_map<???>::iterators. Keep careful track of when a rehash occurs in the std::unordered_map (rehashing is deterministic, but you have to work out when it happens yourself), and when it does rebuild it (as all of the iterators will be invalidated by a rehash). Rehashes occur rarely, so this has an amortized constant cost¹. When you want to erase, you can now update the index in the unordered_map in O(1) time (remember to update the second std::vector as well -- swapping position from last to the deleted element). You could store this in the same std::vector as the raw data, but that would bulk it up (which will slow down iteration).

¹ Rehashes happen when the container passes reaches max_load_factor()*bucket_count() size. The bucket_count then grows exponentially, and the elements are moved around. Much like a std::vector's growth algorithm, this guarantees a linear-in-number-of-elements total elements moved -- so it maintains amortized constant insertion. The manual rebuild of your reverse table is no more expensive than the moving of all of the existing elements, so it also contributes an amortized constant insertion time factor.

c++ - 是否有连续存储的哈希映射数据结构？

1 回答 1

Related

Reference