1

We're developing a framework, which directly uses mrpt-1.9 which in turn uses OpenCV 2.4. We were writing unit tests, which segfault when the tests exists (e.g., during cleanup) with an OpenCV error: cv::String::deallocate()

What I have tried:

running with valgrind

==26159== Conditional jump or move depends on uninitialised value(s)
==26159==    at 0x7DB7F5: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB0: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159== 
==26159== Invalid read of size 4
==26159==    at 0x7DB7FB: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB9: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159==  Address 0x1a is not stack'd, malloc'd or (recently) free'd
==26159== 
==26159== 
==26159== Process terminating with default action of signal 11 (SIGSEGV)
==26159==  Access not within mapped region at address 0x1A
==26159==    at 0x7DB7FB: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB9: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159==  If you believe this happened as a result of a stack
==26159==  overflow in your program's main thread (unlikely but
==26159==  possible), you can try to increase the size of the
==26159==  main thread stack using the --main-stacksize= flag.
==26159==  The main thread stack size used in this run was 8388608.
==26159== 
==26159== HEAP SUMMARY:
==26159==     in use at exit: 286,067 bytes in 1,147 blocks
==26159==   total heap usage: 7,469 allocs, 6,322 frees, 1,912,969 bytes allocated
==26159== 
==26159== LEAK SUMMARY:
==26159==    definitely lost: 0 bytes in 0 blocks
==26159==    indirectly lost: 0 bytes in 0 blocks
==26159==      possibly lost: 2,299 bytes in 27 blocks
==26159==    still reachable: 283,768 bytes in 1,120 blocks
==26159==                       of which reachable via heuristic:
==26159==                         newarray           : 1,536 bytes in 16 blocks
==26159==         suppressed: 0 bytes in 0 blocks
==26159== Rerun with --leak-check=full to see details of leaked memory
==26159== 
==26159== For counts of detected and suppressed errors, rerun with: -v
==26159== Use --track-origins=yes to see where uninitialised values come from
==26159== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

AFAIK this could be either us calling an MRPT function incorrectly, or a bug in MRPT itself.

running it with gdb:

I've been trying to debug it in gdb, but I can only go as far as getting the backtrace, but not which part of our code is the one responsible for it. Since it seems to happen after main exits, it is really confusing. Even worse, the class we construct (but do not actually do anything with) does not contain any MRPT classes or objects, so I am guessing this is in MRPT libraries and not our framework.

Thread 1 "debug" received signal SIGSEGV, Segmentation fault.
0x00000000005b569b in cv::String::deallocate() ()
(gdb) bt
#0  0x00000000005b569b in cv::String::deallocate() ()
#1  0x000000000089969a in cv::BmpEncoder::~BmpEncoder() ()
#2  0x00000000008996d9 in cv::BmpEncoder::~BmpEncoder() [clone .localalias.25] ()
#3  0x00007ffff36a4f66 in cv::ImageCodecInitializer::~ImageCodecInitializer() () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#4  0x00007ffff484136a in __cxa_finalize (d=0x7ffff38d1000) at cxa_finalize.c:56
#5  0x00007ffff369fb53 in ?? () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#6  0x00007fffffffd8b0 in ?? ()
#7  0x00007ffff7de7de7 in _dl_fini () at dl-fini.c:235
Backtrace stopped: frame did not save the PC

I've set a breakpoint at break cv::ImageCodecInitializer::~ImageCodecInitializer

and I got as far as:

Thread 1 "debug" hit Breakpoint 3, 0x0000000000888ad0 in cv::ImageCodecInitializer::~ImageCodecInitializer() ()
(gdb) bt
#0  0x0000000000888ad0 in cv::ImageCodecInitializer::~ImageCodecInitializer() ()
#1  0x00007ffff4840ff8 in __run_exit_handlers (status=0, listp=0x7ffff4bcb5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#2  0x00007ffff4841045 in __GI_exit (status=<optimised out>) at exit.c:104
#3  0x00007ffff4827837 in __libc_start_main (main=0x5a4536 <main()>, argc=1, argv=0x7fffffffd9d8, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fffffffd9c8) at ../csu/libc-start.c:325
#4  0x00000000005a4469 in _start ()

searched for opencv-2.4 debug

The app is build with debug symbols, but the system does not appear to have opencv-2.4 with debug symbols, so I keep getting the optimized out warning.

libopencv-apps-dev - opencv_apps Robot OS package - development files
libopencv-apps0d - opencv_apps Robot OS package - runtime files
libopencv-calib3d2.4v5 - computer vision Camera Calibration library
libopencv-contrib-dev - development files for libopencv-contrib
libopencv-contrib2.4v5 - computer vision contrib library
libopencv-core2.4v5 - computer vision core library
libopencv-dev - development files for opencv
libopencv-features2d2.4v5 - computer vision Feature Detection and Descriptor Extraction library
libopencv-flann2.4v5 - computer vision Clustering and Search in Multi-Dimensional spaces library
libopencv-gpu-dev - development files for libopencv-gpu2.4v5
libopencv-gpu2.4v5 - computer vision GPU library
libopencv-highgui2.4v5 - computer vision High-level GUI and Media I/O library
libopencv-imgproc2.4v5 - computer vision Image Processing library
libopencv-legacy-dev - development files for libopencv-legacy
libopencv-legacy2.4v5 - computer vision legacy library
libopencv-ml2.4v5 - computer vision Machine Learning library
libopencv-objdetect2.4v5 - computer vision Object Detection library
libopencv-ocl-dev - development files for libopencv-ocl2.4v5
libopencv-ocl2.4v5 - computer vision OpenCL support library
libopencv-photo2.4v5 - computer vision computational photography library
libopencv-stitching2.4v5 - computer vision image stitching library
libopencv-superres2.4v5 - computer vision Super Resolution library
libopencv-ts2.4v5 - computer vision ts library
libopencv-video2.4v5 - computer vision Video analysis library
libopencv-videostab2.4v5 - computer vision video stabilization library
libopencv2.4-java - Java bindings for the computer vision library
libopencv2.4-jni - Java jni library for the computer vision library

searched for actual point of offending function

I've gone through the minified debug executable we've built in order to try and pin-point the issue, and then tried searching for the actual function:

nm -Ca debug | grep "ImageCodecInitializer"
0000000000889290 W cv::ImageCodecInitializer::ImageCodecInitializer()
0000000000889290 W cv::ImageCodecInitializer::ImageCodecInitializer()
0000000000888ad0 W cv::ImageCodecInitializer::~ImageCodecInitializer()
0000000000888ad0 W cv::ImageCodecInitializer::~ImageCodecInitializer()

Then I tried to find what GDB has to say about those addresses:

(gdb) info line *0x0000000000889290
No line number information available for address 0x889290 <_ZN2cv21ImageCodecInitializerC2Ev>

But I can't go anywhere from there, so I searched in GDB to find who constructs this:

#0  0x00007ffff36a6240 in cv::ImageCodecInitializer::ImageCodecInitializer() () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#1  0x00007ffff369f8f6 in ?? () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#2  0x00007ffff7de76ba in call_init (l=<optimised out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd9d8, env=env@entry=0x7fffffffd9e8) at dl-init.c:72
#3  0x00007ffff7de77cb in call_init (env=0x7fffffffd9e8, argv=0x7fffffffd9d8, argc=1, l=<optimised out>) at dl-init.c:30
#4  _dl_init (main_map=0x7ffff7ffe168, argc=1, argv=0x7fffffffd9d8, env=0x7fffffffd9e8) at dl-init.c:120
#5  0x00007ffff7dd7c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6  0x0000000000000001 in ?? ()
#7  0x00007fffffffdda0 in ?? ()
#8  0x0000000000000000 in ?? ()

Again optimized out.

searched for library which uses the offending function

The function is in libopencv_highgui.so.2.4 so I am guessing that one of MRPT libs is using it, so I went searching for which MRPT libs we're linking against which is using it, and found it:

readelf -d debug 

Dynamic section at offset 0x2b49bb0 contains 41 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libboost_system.so.1.58.0]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_filesystem.so.1.58.0]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libmrpt-base.so.1.9]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libjpeg.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libpng12.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libtiff.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libjasper.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libIlmImf-2_2.so.22]
 0x0000000000000001 (NEEDED)             Shared library: [libHalf.so.12]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

So, I found that:

sudo ldconfig -p | grep "libmrpt-base.so.1.9"
        libmrpt-base.so.1.9 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmrpt-base.so.1.9

And then:

readelf -d /usr/lib/x86_64-linux-gnu/libmrpt-base.so.1.9

Dynamic section at offset 0xa5aea8 contains 37 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libcxsparse.so.3.1.4]
 0x0000000000000001 (NEEDED)             Shared library: [libwx_baseu-3.0.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libwx_gtk2u_core-3.0.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libjpeg.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_highgui.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [libmrpt-base.so.1.9]

I know this is the library creating the issue, because in our project we use opencv-3.3 statically linked against it. Sadly, the repository we're using does not have debug symbols for MRPT either:

libmrpt-base1.9 - Mobile Robot Programming Toolkit - base library
libmrpt-detectors1.9 - Mobile Robot Programming Toolkit - detectors library
libmrpt-graphs1.9 - Mobile Robot Programming Toolkit - graphs library
libmrpt-graphslam1.9 - Mobile Robot Programming Toolkit - graphslam library
libmrpt-gui1.9 - Mobile Robot Programming Toolkit - gui library
libmrpt-hmtslam1.9 - Mobile Robot Programming Toolkit - hmtslam library
libmrpt-hwdrivers1.9 - Mobile Robot Programming Toolkit - hwdrivers library
libmrpt-kinematics1.9 - Mobile Robot Programming Toolkit - kinematics library
libmrpt-maps1.9 - Mobile Robot Programming Toolkit - maps library
libmrpt-nav1.9 - Mobile Robot Programming Toolkit - nav library
libmrpt-obs1.9 - Mobile Robot Programming Toolkit - obs library
libmrpt-opengl1.9 - Mobile Robot Programming Toolkit - opengl library
libmrpt-slam1.9 - Mobile Robot Programming Toolkit - slam library
libmrpt-tfest1.9 - Mobile Robot Programming Toolkit - tfest library
libmrpt-topography1.9 - Mobile Robot Programming Toolkit - topography library
libmrpt-vision1.9 - Mobile Robot Programming Toolkit - vision library
libmrpt-comms1.9 - Mobile Robot Programming Toolkit - comms library

And even worse:

nm -C libmrpt-base.so
nm: libmrpt-base.so: no symbols

And this is where the journey ends.

What are my options?

  • use another version of mrpt?
  • compile mrpt with debug symbols?
  • compile opencv-2.4 with debug symbols?

Any help, hints or tips are greatly appreciated. If this question is too localized, does not conform to SO standards, please leave a comment and I will update it.

4

2 回答 2

1

我的第一个猜测是,由于一次使用两个 opencv 版本,您可能会遇到此问题...尝试从来源构建 mrpt,告诉 CMake 使用与主项目相同的 opencv 版本。

mrpt-base 不直接使用来自 highgui 的任何东西(虽然......它与它相关联!这应该是固定的,肯定有四个),所以我怀疑这个错误与 opencv 模块中静态变量的初始化有关,并且有问题链接器...

干杯

于 2017-09-20T05:59:17.807 回答
1

不是真正的答案,但注释不利于格式化代码。github上最新的opencv有以下源码

void cv::String::deallocate()
{
    int* data = (int*)cstr_;
    len_ = 0;
    cstr_ = 0;

    if(data && 1 == CV_XADD(data-1, -1))
    {
        cv::fastFree(data-1);
    }
}

(可能比您的版本更新)。

看起来这是将字符串存储为前 4 个字节中的引用计数,然后是 nul 终止的字符串。该if条件检查指针是否为 NULL,然后看起来它正在对 ref 计数进行原子递减,并在计数降至 1 时释放内存。

于 2017-09-20T09:55:52.917 回答