crash-dumps - 当我的应用程序锁定在客户的机器上时，崩溃报告看门狗

Question

我正在使用一个有些不可靠的（Qt/windows）应用程序，部分是由第三方为我们编写的（只是想把责任推到那里）。他们的最新版本更稳定。有点。我们收到的崩溃报告越来越少，但我们收到很多关于它只是挂起并且永远不会回来的报告。情况是多种多样的，由于我们收集到的信息很少，我们无法重现问题。

所以理想情况下，我想创建某种看门狗，它会注意到应用程序已锁定，并提供向我们发送崩溃报告。好主意，但有问题：

看门狗如何知道进程已挂起？大概我们检测应用程序以定期向看门狗说“一切正常”，但我们将其放在哪里，以保证它足够频繁地发生，但不太可能位于应用程序最终运行时的代码路径上锁定。
当崩溃发生时，看门狗应该报告什么信息？Windows 有一个不错的调试 api，所以我相信所有有趣的数据都可以访问，但我不确定什么对追踪问题有用。

score 5 · Accepted Answer

You want a combination of a minidump (use DrWatson to create these if you don't want to add your own mini-dump generation code) and userdump to trigger a minidump creation on a hang.

The thing about automatically detecting a hang is that its difficult to decide when somethings hung and when its just slow or blocked by IO wait. I personally prefer to allow the user to crash the app deliberately when they think its hung. Apart from being a lot easier (my apps don't tend to hang often, if at all :) ), it also helps them to "be part of the solution". They like that.

Firstly, check out the classic bugslayer article concerning crashdumps and symbols, which also has some excellent information regarding what's going on with these things.

Second, get userdump which allows you to create the dumps, and instructions for setting it up to generate dumps

When you have the dump, open it in WinDBG, and you will be able to inspect the entire program state - including threads and callstacks, registers, memory and parameters to functions. I think you'll be particularly interested in using the "~*kp" command in Windbg to get the callstack of every thread, and the "!locks" command to show all locking objects. I think you'll find that the hang will be due to a deadlock of synchronisation objects, which will be difficult to track down as all threads tend to wait on a WaitForSingleObject call, but look further down the callstacks to see the application threads (rather than 'framework' threads like background notifications and network routines). Once you've narrowed them down, you can see what calls were being made, possibly add some logging instrumentation to the app to try and give you more information ready for the next time it fails.

Good luck.

Ps. Quick google reminded me of this: Debugging deadlocks. (CDB is the command line equivalent of windbg)

score 2 · Accepted Answer

You can use ADPlus from Microsoft's Debugging Tools for Windows to identify the hangs. It will attach to your process and create a dump (mini or full) when the process hangs or crashes.

WinDbg is portable, and does not have to be installed (you do have to configure the symbols, though). You can create a special installation that will launch your app using a batch, which will also run ADPlus after your app starts (ADPlus is a commandline tool, so you should be able to find a way to incorporate it somehow).

BTW, if you do find a way to recognize the hang internally and are able to crash the process, you can register with Windows Error Reporting so that the crash dump will be sent to you (should the user allow it).

score 1 · Accepted Answer

Don't bother with a watchdog. Subscribe to Microsoft's Windows Error Reproting (winqual.microsoft.com). They'll collect the stacktraces for you. In fact, it's quite likely they're already doing so today; they don't share them until you sign up.

score 1 · Accepted Answer

我认为单独的应用程序进行监视可能会产生比它解决的问题更多的问题。我建议相反，您首先创建处理程序以在应用程序崩溃时生成小型转储，然后向应用程序添加一个看门狗线程，如果应用程序脱轨，它将故意崩溃。看门狗线程（相对于不同的应用程序）的优势在于，看门狗应该更容易确定应用程序已经偏离轨道。

拥有 MiniDumps 后，您可以四处寻找应用程序死亡时的状态。这应该为您提供足够的线索来找出问题所在，或者至少为下一步寻找方向。

CodeProject 上有一些关于MiniDumps的内容，这可能是一个有用的示例。MSDN 也有关于它们的更多信息。

crash-dumps - 当我的应用程序锁定在客户的机器上时，崩溃报告看门狗

4 回答 4

Related

Reference