2

I have a bug somewhere that is causing my app to just vanish without an error message or something like that. The app just dissapears from the screen and it's no longer listed on the Task Manager.

The app is a C++Builder app (CBuilder2007), and I have tried everything I have think of to try to catch this error. It happens very very seldom, it has never crashed on my machine and just once in the test machines we have in the office. With one of our customers it happens a little bit more frequent, but we haven't find a way to make it happen, or to find the circumstances where it happens. It is a heavy multithreaded app.

I have madExcept enabled in this app, but it doesn't catch anything. I have already added handlers using the set_terminate and set_unexpected RTL routines, without any luck.

The only info I have is from a "loader app" wrapper I did, to get the return code from the main app. It exits with the C0000005 code, which I believe means an Access Violation happened. The strange thing is that, as mentioned, there is not even the Windows error box or something like that.

The question would be: any ideas to try to catch this? As I don't even have a clue where this might be happening (I have a lot of logging around the app, but the "trail" before the app crashes hasn't lead to anywhere) my idea with the set_terminate and set_unexpected routines was to get a stack trace to try to see where the error was generated, but so far those routines aren't being called at all (at least the only time this has happened here in my office)

Thanks in advance


[Update 22.Sept.2009] Using AddVectoredHandlerException I was able to get a callstack from the crash, and now I can start trying to isolate and fix the bug. Thanks!!!

4

11 回答 11

6

terminate/unexpected gets called only by C++ runtime, and only for C++ exceptions.

Access violation is a SEH exception - to catch that, you need SetUnhandledExceptionFilter, or AddVectoredExceptionHandler (if it's >=XP). You could then create a minidump, using MiniDumpWriteDump and related.

于 2009-09-19T19:48:04.563 回答
3

I've come across issues like this a couple times, where the application seems to simply stop. No exception handlers or crash handlers or such are invoked. The app simply seems to terminate instantly.

Unfortunately, I can't offer any easy advice on how to figure it out. The other responses here have some good ideas. If you don't already having something to catch unhandled exceptions as per PiotrLegnica's reponse, then you should do so.

However, if the program is truly terminating instantly like the times I've seen this, then even a handler registered with SetUnhandledExceptionFilter won't help. The program is stopping all execution and dropping out of memory before the handler is ever invoked.

A few ideas so come to mind though:

  • Check your codebase for any usage of TerminateProcess or TerminateThread. I could be wrong but I believe usage of these might be able to cause the symptoms you're seeing.
  • Check any usage of function pointers, including callbacks and WindowProcs passed to Window's APIs. Make sure that the calling conventions, parameter lists, and return values all match correctly. If a function pointer is being casted to make the code compile, it may be hiding a mismatch that could be causing bad things to happen.
  • Consider any 3rd-party libraries or components (ActiveX or such) that you're using. Maybe they have a bug in their own code causing this problem in obscure situations. You could try placing logging statements before and after calls to their functions to see if that can pin down where the program stops.
  • And if nothing else helps, put more logging throughout your own code.

And on the subject of logging: When I had to help track down a problem like this where at my job, we ended up making a logging mechanism that would create a uniquely named log each time the program started and would delete it if the program ended normally. That way, another log file would be left in existence each time the termination problem occurred. We used a date-time stamp as part of the unique naming aspect. The content of the logs was simply a record of which actions were happening in the program. We went through several iterations of examining logs and then adding more logging statements until this finally led us to source. And while tracking down the problem, this mechanism gave us a very clear idea on just how frequently the problem was occurring. You might consider something similar.

于 2009-09-19T23:35:42.170 回答
1

I've seen this happen to C++ code twice before:

  1. When dynamically loading a Windows API using LoadLibrary and GetProcAddress and then calling it through a function pointer declared with the wrong calling convention (it should have had __stdcall but didn't).

  2. Where a class had a function pointer as a member variable, and the function pointer was called before having been initialised.

于 2009-09-19T19:54:07.220 回答
1

You have to run your application in debug mode, and do a stress test by running set of complicated scenario more and more, so you can catch the exception in the debug mode.

Also try to review the code again that contains accessing the shared memory between your threads, may be the problem from mulithreading, you can try putting locks on every shared memory access to getting sure mulithreading is the reason(but this will decrease performance)

于 2009-09-19T20:03:36.133 回答
1

Re: the app disappearing, do the customer machines have the "Report errors" setting turned off in Windows? It's buried in the "System" control panel, and when it's turned off the normal Windows crash notification dialog is supressed.

于 2009-09-19T20:14:46.890 回答
1

Maybe adding a good, old fashioned signal handler might at least give some indication of what happened?

于 2009-09-19T22:40:06.850 回答
1

You've got a tough situation if you can't reproduce it locally. Capturing a crash dump, or catching the program in the act with a debugger is certainly your best option, as others have suggested.

If this were my problem, I'd try monitoring with Process Monitor from sysinternals. Set it up to watch only your process, and make sure it's backed by a file if it will take a long time. This might tell you which thread is active and what is happening when the process ends. You might also try to find the equivalent of 'truss' for Windows - a program to monitor system calls.

于 2009-09-21T15:00:46.847 回答
0

There are two more things that I have seen do this in the past that you might consider: stack overflow (infinite recursion, bad parameters causing big temporary variables to be located on the stack etc), or an unhandled exception in a secondary thread.

于 2009-09-19T19:57:57.133 回答
0

Configure your app to write a minidump in case of a crash.

I am not sure how this is in CBuilder, but in visual studio you can load this dumps directly and it shows you a complete callstack and the source code line that caused the crash.

I used this a lot to find the cause for crashes that happened on customer machines.
However especially for multi-threaded applications it is likely that the real error (e.g. memory was released to early) happened a while before the actual crash, so it may still be very difficult to find the root cause.

于 2009-09-19T20:03:17.373 回答
0

Subscribe to Windows Error Reporting. Chances are that some of your customers will report the AV to Microsoft, who'll happily share the collected stack traces with you. As a benefit, you get hard figures on the reliability of your application. Management loves those. E.g. you can set a goal of "reducing the frequency of errors by 50% by 2010".

于 2009-09-21T11:01:08.210 回答
0

Put your app running, then attach the windbg (crash-mode) that the first occurrence of second-chance exception is generated dump. Remember to put the symbol files (PDB).

于 2009-09-21T11:27:41.940 回答