1

Basically, I have an unordered_map and trying to add to it sets of pairs... about 500,000 of them. I've noticed that as I add pairs the insertion speed gets slower and slower until it finally stops all together. Any thoughts as to why this might be or how to fix this?

Map definition:

std::tr1::unordered_map<std::pair<int, int>, int, pairHash> x_map ;

Hash function - note that for my case I don't have to worry about pair.first==pair.second, so I believe this hash function should be sufficient, correct me if am wrong:

class pairHash
        {
        public:
            size_t operator()(const std::pair<int, int> & v) const
            {
                return v.first ^ v.second ;
            }
        } ;

Method to add values to the unordered_map... trying to add about 200,000-500,000 pairs:

initialize_map( EndPoint**& arr, std::tr1::unordered_map<std::pair<int, int>, int, pairHash> &my_map, int size )
{
    for( int i = 0 ; i < size ; i++ )   // add initial overlapping pairs
    {
        if( i % 100 == 0 )
            std::cout << "checking particle: " << i << " maxsize: " << my_map.max_size() << std::endl ;
        int j = 1 ;
        while( arr[i]->isMin && i+j < size &&    // while ys is a min, and not end of array
              arr[i]->v_id != arr[i+j]->v_id )      // anything between min and max is a possible collision
        {
            if( !arr[i]->isEdge || !arr[i+j]->isEdge )
            {
                my_map[std::make_pair( std::min( arr[i]->v_id, arr[i+j]->v_id ),
                        std::max( arr[i]->v_id, arr[i+j]->v_id ) )] = 1 ;
            }

            j++ ;
        }
    }
}

EDIT: I am actually adding closer to 50,000,000 pairs... just ran a test...

EDIT2:

Example output before it freezes, where count is the number of entries in the map. I believe it is trying to rehash the map, but not sure why it is failing to do so and freezing the computer:

checking particle: 87500 count: 35430415 load factor: 0.988477

checking particle: 87600 count: 35470808 load factor: 0.989652

checking particle: 87700 count: 35511049 load factor: 0.990818

checking particle: 87800 count: 35555974 load factor: 0.992073

checking particle: 87900 count: 35595646 load factor: 0.993163

checking particle: 88000 count: 35642165 load factor: 0.994427

checking particle: 88100 count: 35679608 load factor: 0.995434

checking particle: 88200 count: 35721223 load factor: 0.996563

checking particle: 88300 count: 35760313 load factor: 0.997616

checking particle: 88400 count: 35799621 load factor: 0.9987

checking particle: 88500 count: 35833445 load factor: 0.999649

4

3 回答 3

4

It's probably best to stick with the Boost hash_combine solution for a better hash function:

template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

namespace std
{
  template<typename S, typename T> struct hash< std::pair<S, T> >
  {
    inline std::size_t operator()(const std::pair<S, T> & v) const
    {
      std::size_t seed = 0;
      hash_combine(seed, v.first);
      hash_combine(seed, v.second);
      return seed;
    }
  };
}
于 2011-11-06T21:25:56.520 回答
1

尝试查看 unordered_map::load_factor()。理想情况下,此调用的结果应 < 1.0。如果它超过 1.0,那么您的哈希函数可能是狡猾的。您应该使用 hash_combine 而不是对您的配对进行异或运算。

于 2011-11-07T13:00:41.693 回答
0

您是否尝试过使用reserve()为所有配对预先分配足够的存储桶?添加这么多对可能会触发许多调整大小(和重新散列)。

我要检查的下一件事是您的哈希函数。它看起来有点可疑,如果你遇到很多哈希冲突,你会得到一堆溢出桶,这会减慢每个插入的查找速度——在这种情况下,你最好使用std::map. 您可以修改代码以存储每对的哈希值,然后检查您生成的唯一哈希值的数量。

于 2011-11-07T02:46:38.297 回答