We're developing a framework, which directly uses mrpt-1.9 which in turn uses OpenCV 2.4.
We were writing unit tests, which segfault when the tests exists (e.g., during cleanup) with an OpenCV error: cv::String::deallocate()
What I have tried:
running with valgrind
==26159== Conditional jump or move depends on uninitialised value(s)
==26159== at 0x7DB7F5: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0xAF9FB0: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159== by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159== by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159== by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159== by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159== by 0x807A044: exit (exit.c:104)
==26159== by 0x8060836: (below main) (libc-start.c:325)
==26159==
==26159== Invalid read of size 4
==26159== at 0x7DB7FB: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0xAF9FB9: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159== by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159== by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159== by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159== by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159== by 0x807A044: exit (exit.c:104)
==26159== by 0x8060836: (below main) (libc-start.c:325)
==26159== Address 0x1a is not stack'd, malloc'd or (recently) free'd
==26159==
==26159==
==26159== Process terminating with default action of signal 11 (SIGSEGV)
==26159== Access not within mapped region at address 0x1A
==26159== at 0x7DB7FB: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0xAF9FB9: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159== by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159== by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159== by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159== by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159== by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159== by 0x807A044: exit (exit.c:104)
==26159== by 0x8060836: (below main) (libc-start.c:325)
==26159== If you believe this happened as a result of a stack
==26159== overflow in your program's main thread (unlikely but
==26159== possible), you can try to increase the size of the
==26159== main thread stack using the --main-stacksize= flag.
==26159== The main thread stack size used in this run was 8388608.
==26159==
==26159== HEAP SUMMARY:
==26159== in use at exit: 286,067 bytes in 1,147 blocks
==26159== total heap usage: 7,469 allocs, 6,322 frees, 1,912,969 bytes allocated
==26159==
==26159== LEAK SUMMARY:
==26159== definitely lost: 0 bytes in 0 blocks
==26159== indirectly lost: 0 bytes in 0 blocks
==26159== possibly lost: 2,299 bytes in 27 blocks
==26159== still reachable: 283,768 bytes in 1,120 blocks
==26159== of which reachable via heuristic:
==26159== newarray : 1,536 bytes in 16 blocks
==26159== suppressed: 0 bytes in 0 blocks
==26159== Rerun with --leak-check=full to see details of leaked memory
==26159==
==26159== For counts of detected and suppressed errors, rerun with: -v
==26159== Use --track-origins=yes to see where uninitialised values come from
==26159== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
AFAIK this could be either us calling an MRPT function incorrectly, or a bug in MRPT itself.
running it with gdb:
I've been trying to debug it in gdb, but I can only go as far as getting the backtrace, but not which part of our code is the one responsible for it. Since it seems to happen after main exits, it is really confusing. Even worse, the class we construct (but do not actually do anything with) does not contain any MRPT classes or objects, so I am guessing this is in MRPT libraries and not our framework.
Thread 1 "debug" received signal SIGSEGV, Segmentation fault.
0x00000000005b569b in cv::String::deallocate() ()
(gdb) bt
#0 0x00000000005b569b in cv::String::deallocate() ()
#1 0x000000000089969a in cv::BmpEncoder::~BmpEncoder() ()
#2 0x00000000008996d9 in cv::BmpEncoder::~BmpEncoder() [clone .localalias.25] ()
#3 0x00007ffff36a4f66 in cv::ImageCodecInitializer::~ImageCodecInitializer() () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#4 0x00007ffff484136a in __cxa_finalize (d=0x7ffff38d1000) at cxa_finalize.c:56
#5 0x00007ffff369fb53 in ?? () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#6 0x00007fffffffd8b0 in ?? ()
#7 0x00007ffff7de7de7 in _dl_fini () at dl-fini.c:235
Backtrace stopped: frame did not save the PC
I've set a breakpoint at break cv::ImageCodecInitializer::~ImageCodecInitializer
and I got as far as:
Thread 1 "debug" hit Breakpoint 3, 0x0000000000888ad0 in cv::ImageCodecInitializer::~ImageCodecInitializer() ()
(gdb) bt
#0 0x0000000000888ad0 in cv::ImageCodecInitializer::~ImageCodecInitializer() ()
#1 0x00007ffff4840ff8 in __run_exit_handlers (status=0, listp=0x7ffff4bcb5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#2 0x00007ffff4841045 in __GI_exit (status=<optimised out>) at exit.c:104
#3 0x00007ffff4827837 in __libc_start_main (main=0x5a4536 <main()>, argc=1, argv=0x7fffffffd9d8, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fffffffd9c8) at ../csu/libc-start.c:325
#4 0x00000000005a4469 in _start ()
searched for opencv-2.4 debug
The app is build with debug symbols, but the system does not appear to have opencv-2.4 with debug symbols, so I keep getting the optimized out warning.
libopencv-apps-dev - opencv_apps Robot OS package - development files
libopencv-apps0d - opencv_apps Robot OS package - runtime files
libopencv-calib3d2.4v5 - computer vision Camera Calibration library
libopencv-contrib-dev - development files for libopencv-contrib
libopencv-contrib2.4v5 - computer vision contrib library
libopencv-core2.4v5 - computer vision core library
libopencv-dev - development files for opencv
libopencv-features2d2.4v5 - computer vision Feature Detection and Descriptor Extraction library
libopencv-flann2.4v5 - computer vision Clustering and Search in Multi-Dimensional spaces library
libopencv-gpu-dev - development files for libopencv-gpu2.4v5
libopencv-gpu2.4v5 - computer vision GPU library
libopencv-highgui2.4v5 - computer vision High-level GUI and Media I/O library
libopencv-imgproc2.4v5 - computer vision Image Processing library
libopencv-legacy-dev - development files for libopencv-legacy
libopencv-legacy2.4v5 - computer vision legacy library
libopencv-ml2.4v5 - computer vision Machine Learning library
libopencv-objdetect2.4v5 - computer vision Object Detection library
libopencv-ocl-dev - development files for libopencv-ocl2.4v5
libopencv-ocl2.4v5 - computer vision OpenCL support library
libopencv-photo2.4v5 - computer vision computational photography library
libopencv-stitching2.4v5 - computer vision image stitching library
libopencv-superres2.4v5 - computer vision Super Resolution library
libopencv-ts2.4v5 - computer vision ts library
libopencv-video2.4v5 - computer vision Video analysis library
libopencv-videostab2.4v5 - computer vision video stabilization library
libopencv2.4-java - Java bindings for the computer vision library
libopencv2.4-jni - Java jni library for the computer vision library
searched for actual point of offending function
I've gone through the minified debug executable we've built in order to try and pin-point the issue, and then tried searching for the actual function:
nm -Ca debug | grep "ImageCodecInitializer"
0000000000889290 W cv::ImageCodecInitializer::ImageCodecInitializer()
0000000000889290 W cv::ImageCodecInitializer::ImageCodecInitializer()
0000000000888ad0 W cv::ImageCodecInitializer::~ImageCodecInitializer()
0000000000888ad0 W cv::ImageCodecInitializer::~ImageCodecInitializer()
Then I tried to find what GDB has to say about those addresses:
(gdb) info line *0x0000000000889290
No line number information available for address 0x889290 <_ZN2cv21ImageCodecInitializerC2Ev>
But I can't go anywhere from there, so I searched in GDB to find who constructs this:
#0 0x00007ffff36a6240 in cv::ImageCodecInitializer::ImageCodecInitializer() () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#1 0x00007ffff369f8f6 in ?? () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#2 0x00007ffff7de76ba in call_init (l=<optimised out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd9d8, env=env@entry=0x7fffffffd9e8) at dl-init.c:72
#3 0x00007ffff7de77cb in call_init (env=0x7fffffffd9e8, argv=0x7fffffffd9d8, argc=1, l=<optimised out>) at dl-init.c:30
#4 _dl_init (main_map=0x7ffff7ffe168, argc=1, argv=0x7fffffffd9d8, env=0x7fffffffd9e8) at dl-init.c:120
#5 0x00007ffff7dd7c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6 0x0000000000000001 in ?? ()
#7 0x00007fffffffdda0 in ?? ()
#8 0x0000000000000000 in ?? ()
Again optimized out.
searched for library which uses the offending function
The function is in libopencv_highgui.so.2.4
so I am guessing that one of MRPT libs is using it, so I went searching for which MRPT libs we're linking against which is using it, and found it:
readelf -d debug
Dynamic section at offset 0x2b49bb0 contains 41 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libboost_system.so.1.58.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_filesystem.so.1.58.0]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libmrpt-base.so.1.9]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libjpeg.so.8]
0x0000000000000001 (NEEDED) Shared library: [libpng12.so.0]
0x0000000000000001 (NEEDED) Shared library: [libtiff.so.5]
0x0000000000000001 (NEEDED) Shared library: [libjasper.so.1]
0x0000000000000001 (NEEDED) Shared library: [libz.so.1]
0x0000000000000001 (NEEDED) Shared library: [libIlmImf-2_2.so.22]
0x0000000000000001 (NEEDED) Shared library: [libHalf.so.12]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
So, I found that:
sudo ldconfig -p | grep "libmrpt-base.so.1.9"
libmrpt-base.so.1.9 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmrpt-base.so.1.9
And then:
readelf -d /usr/lib/x86_64-linux-gnu/libmrpt-base.so.1.9
Dynamic section at offset 0xa5aea8 contains 37 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libcxsparse.so.3.1.4]
0x0000000000000001 (NEEDED) Shared library: [libwx_baseu-3.0.so.0]
0x0000000000000001 (NEEDED) Shared library: [libwx_gtk2u_core-3.0.so.0]
0x0000000000000001 (NEEDED) Shared library: [libz.so.1]
0x0000000000000001 (NEEDED) Shared library: [libjpeg.so.8]
0x0000000000000001 (NEEDED) Shared library: [libopencv_highgui.so.2.4]
0x0000000000000001 (NEEDED) Shared library: [libopencv_imgproc.so.2.4]
0x0000000000000001 (NEEDED) Shared library: [libopencv_core.so.2.4]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000e (SONAME) Library soname: [libmrpt-base.so.1.9]
I know this is the library creating the issue, because in our project we use opencv-3.3 statically linked against it. Sadly, the repository we're using does not have debug symbols for MRPT either:
libmrpt-base1.9 - Mobile Robot Programming Toolkit - base library
libmrpt-detectors1.9 - Mobile Robot Programming Toolkit - detectors library
libmrpt-graphs1.9 - Mobile Robot Programming Toolkit - graphs library
libmrpt-graphslam1.9 - Mobile Robot Programming Toolkit - graphslam library
libmrpt-gui1.9 - Mobile Robot Programming Toolkit - gui library
libmrpt-hmtslam1.9 - Mobile Robot Programming Toolkit - hmtslam library
libmrpt-hwdrivers1.9 - Mobile Robot Programming Toolkit - hwdrivers library
libmrpt-kinematics1.9 - Mobile Robot Programming Toolkit - kinematics library
libmrpt-maps1.9 - Mobile Robot Programming Toolkit - maps library
libmrpt-nav1.9 - Mobile Robot Programming Toolkit - nav library
libmrpt-obs1.9 - Mobile Robot Programming Toolkit - obs library
libmrpt-opengl1.9 - Mobile Robot Programming Toolkit - opengl library
libmrpt-slam1.9 - Mobile Robot Programming Toolkit - slam library
libmrpt-tfest1.9 - Mobile Robot Programming Toolkit - tfest library
libmrpt-topography1.9 - Mobile Robot Programming Toolkit - topography library
libmrpt-vision1.9 - Mobile Robot Programming Toolkit - vision library
libmrpt-comms1.9 - Mobile Robot Programming Toolkit - comms library
And even worse:
nm -C libmrpt-base.so
nm: libmrpt-base.so: no symbols
And this is where the journey ends.
What are my options?
- use another version of mrpt?
- compile mrpt with debug symbols?
- compile opencv-2.4 with debug symbols?
Any help, hints or tips are greatly appreciated. If this question is too localized, does not conform to SO standards, please leave a comment and I will update it.