3

我正在运行一个涉及在 red hat linux 64 位上构建倒排索引的 C++ 程序。我的反转索引被定义为map<unsigned long long int, map<int,int> > invertID;并且我得到了这个错误,它随机崩溃,what(): St9bad_alloc每次崩溃都是不同的。有时,我有 100,000,000 个密钥,但它还能运行一段时间。有时,大约有 80,000,000 个键,它已经喊出了错误。

谷歌搜索,我发现这个错误可能来自new,但是看看我的代码,我没有使用任何new关键字,但是,我有这样的内存分配与 map。我继续在每次迭代中插入键/值对。try catch所以我决定用声明做一些实验。

实际上,这里是代码和输出的关键部分:

    map<unsigned long long int, map<int,int> >::iterator mainMapIt = invertID.find(ID);
    if (mainMapIt != invertID.end()){
    //if this ImageID key exists in InvID sub-map
        map<int,int> M = mainMapIt->second; // THIS IS LINE 174.
        map<int,int>::iterator subMapIt = M.find(imageID);
        if (subMapIt != M.end()){
        //increment the number of this ImageID key
            ++invertID[ID][imageID];
        }
        else{
        //add ImageID key with value 1 into the InvertID
            try{
                invertID[ID][imageID] = 1;
                ++totalPushBack;
            }catch (bad_alloc ba){
                cout << "CAUGHT 1: invertID[" << ID << "][" << imageID << endl;
            }
        }
    }
    else{
    //create the first empty map with the key as image ID with value 1 and put it in implicitly to the invertID
        try{
            invertID[ID][imageID] = 1;
        }catch (bad_alloc ba){
            cout << "CAUGHT 2: invertID[" << ID << "][" << imageID << endl;
        }
    }

输出:

...
CAUGHT 2: invertID[21959247897][3856
CAUGHT 2: invertID[38022506156][3856
CAUGHT 2: invertID[29062506144][3856
terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc

我看到当我尝试插入新密钥时,会抛出错误。然而,在我用块St9bad_alloc盖住钥匙插入部分后,我得到了更多的惊喜。try catch我做了一点回溯,结果如下:

(gdb) backtrace
#0  0x000000344ac30265 in raise () from /lib64/libc.so.6
#1  0x000000344ac31d10 in abort () from /lib64/libc.so.6
#2  0x00000034510becb4 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#3  0x00000034510bcdb6 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x00000034510bcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5  0x00000034510bceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x00000034510bd1d9 in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6
#7  0x0000000000406544 in __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<int const, int> > >::allocate (
    this=0x7fffffffdfc0, __n=1)
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:88
#8  0x0000000000406568 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_get_node (this=0x7fffffffdfc0)
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:358
#9  0x0000000000406584 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_create_node (this=0x7fffffffdfc0, __x=...)
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:367
#10 0x00000000004065e3 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_clone_node (this=0x7fffffffdfc0, __x=0x21c082bd0)
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:381
#11 0x0000000000406634 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_copy (this=0x7fffffffdfc0, __x=0x21c082bd0, __p=0x7fffffffdfc8)
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:1226
#12 0x00000000004067e9 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_Rb_tree (this=0x7fffffffdfc0, __x=...)
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:570
#13 0x0000000000406885 in std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >::map (
    this=0x7fffffffdfc0, __x=...) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_map.h:175
#14 0x0000000000403039 in generateInvertID (pathToPF=0x6859a8 "/home/karl/c/000605.pf",
    pathToC=0x38c139ed8 "/home/karl/c/000605.c", imageID=3856)
    at InvertIndexGen.cpp:174
#15 0x0000000000403b46 in generateInvertIDForAllPFAndC () at InvertIndexGen.cpp:254
#16 0x0000000000403d0b in main (argc=1, argv=0x7fffffffe448) at InvertIndexGen.cpp:47
(gdb)

在 #14,InvertIndexGen.cpp:174,在我上面的代码中,这是它崩溃的地方:

map<int,int> M = mainMapIt->second; // THIS IS LINE 174.

似乎当我调用时->second,必须创建相应地图的副本。这应该也是原因St9bad_alloc

但是在这种情况下,我可以在这里做些什么吗?毕竟,invertID.max_size()返回 18446744073709551615,而我只使用了大约 1 亿个密钥。我也从 中看到top,我的程序只使用了 10% 的内存。(我们有 128GB 内存)

我应该对这个错误采取哪些措施?我看到我的一些高级同事也在这样做,他们报告说,当他们的反转索引开始增长超过 70-80% 的内存时top,程序开始出现问题。但是我的程序只使用了 10%,那么这里发生了什么?我们可以做些什么来防止这个错误?

编辑:一些评论建议我检查一下ulimit,所以这里是:

-bash-3.2$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1056768
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1056768
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
4

2 回答 2

1

map<int,int> M = mainMapIt->second; // THIS IS LINE 174.
此行将导致不必要的映射副本和内存分配。
更改参考会有所帮助。
map<int,int> & M = mainMapIt->second; // THIS IS LINE 174.

于 2013-02-14T14:36:23.850 回答
1
map<int,int> M = mainMapIt->second; // THIS IS LINE 174.

复制你的第二个。

map<int,int>& M = mainMapIt->second; // THIS IS LINE 174.

至少会有助于避免这个副本。

于 2013-02-14T11:31:30.533 回答