3

TL;DR:在这种情况下,我对 DLL 加载程序锁死锁的猜测是否正确,我如何确定?

我在一些涉及 CRT 时间函数和 National Instruments DAQmx 驱动程序 (9.3.5f2) 的代码中出现间歇性死锁 (50%)。我正在使用 MSVC2008 Express 创建一个 x86 可执行文件(典型的“发布”设置,如果需要,可以提供)并且我在 Win7 Pro x64 上运行。我的代码使用主线程上的时间函数并启动一个新线程来处理更新模拟输出电压(在 USB-6009 上):

#include <iostream>
#include <ctime>
#include <windows.h>
#include <process.h>
#include <NIDAQmx.h>

HANDLE  g_TerminateEvent;

extern "C" unsigned int WINAPI DacUpdateThreadRunner(void *lpParam)
{
    TaskHandle  taskHandle;

    DAQmxCreateTask("", &taskHandle);
    DAQmxCreateAOVoltageChan(taskHandle, "Dev2/ao0", "", 0.0, 3.3, DAQmx_Val_Volts, "");
    DAQmxStartTask(taskHandle);

    float64 sample_value = 0.0;

    bool    quit = false;

    while (!quit)
    {
        DWORD wait_result = WaitForSingleObject(g_TerminateEvent, 32);
        if (wait_result == WAIT_OBJECT_0) quit = true;
        else
        {
            DAQmxWriteAnalogScalarF64(taskHandle, 1, 1.0, sample_value, NULL);
        }
    }

    DAQmxStopTask(taskHandle);
    DAQmxClearTask(taskHandle);

    return 0;
}

int main(void)
{
    g_TerminateEvent = CreateEvent(NULL, TRUE, FALSE, NULL);

    unsigned int m_ThreadId;
    uintptr_t m_Thread = _beginthreadex(NULL, 0, DacUpdateThreadRunner, NULL, 0, &m_ThreadId);

    struct tm t;
    time_t tt = time(NULL);
    struct tm *temp = localtime(&tt);
    memcpy(&t, temp, sizeof(struct tm));

    for (int i = 0; i < 10; i++)
    {
        std::cout << "Main thread doing stuff " << i << std::endl;
        Sleep(1000);
    }

    SetEvent(g_TerminateEvent);
    CloseHandle((HANDLE)m_Thread);

    return 0;
}

It only seems to deadlock if I have the call to localtime() in the code. Looking at the debug output in MSVS it seemed to lock while the 2nd thread was loading the (many) NI DLLs (the last DLLs to be loaded before the deadlock are National Instruments\MAX\mxs.dll, National Instruments\MAX\mxsutils.dll and SysWOW64\version.dll).

In the MSVC 2008 runtime localtime maps to localtime64() and apparently it uses thread local storage under Windows in order to be threadsafe.

I used WinDbg to get the call stacks (shown below) after the application deadlocked and used the !locks command but I cannot see why there would be a deadlock as I can't see any shared resource that both threads are locking. The locks command outputs Scanned 10 critical sections but nothing else (do I need to use a checked build of Windows?).

main thread:

ChildEBP RetAddr  Args to Child              
0035f078 77288df4 000001d0 00000000 00000000 ntdll_77250000!NtWaitForSingleObject+0x15
0035f0dc 77288cd8 00000000 00000000 00000000 ntdll_77250000!RtlpWaitOnCriticalSection+0x13e
0035f104 772a9520 773520c0 773271ca 0035f350 ntdll_77250000!RtlEnterCriticalSection+0x150
0035f144 751a1ee1 005e0000 0035f35c bf6b8258 ntdll_77250000!LdrGetDllHandleByMapping+0x3b
0035f304 751a1fd2 0035f350 0035f348 00000002 KERNELBASE!BasepLoadLibraryAsDataFileInternal+0x4f4
0035f324 751a2221 0035f350 0035f348 00000002 KERNELBASE!BasepLoadLibraryAsDataFile+0x19
0035f360 751993ad 0035f38c 00000000 006a7eb4 KERNELBASE!LoadLibraryExW+0x18a
0035f598 75199535 0035f630 72cbc018 00000002 KERNELBASE!ConvertTimeZoneMuiString+0xe4
0035f5bc 7519966b 0035f5d8 72cbbfc4 72cbc018 KERNELBASE!ConvertTimeZoneMuiStrings+0x155
0035f688 75199729 72cbbfc0 00000001 0035f6f0 KERNELBASE!GetTimeZoneInformationRaw+0x8c
0035f698 72c58d90 72cbbfc0 bf60abfe 0035f778 KERNELBASE!GetTimeZoneInformation+0xf
0035f6f0 72c59390 bf60aa2e 0035f778 00f625f8 MSVCR90!_set_timezone+0x168
0035f720 72c59e79 01103384 00f625f8 000001cc MSVCR90!__tzset+0x2e
0035f748 72c5a0b1 00f625f8 0035f778 00000001 MSVCR90!_localtime64_s+0x9f
0035f75c 01101107 0035f778 01103384 00000001 MSVCR90!_localtime64+0x1a
0035f784 011015e9 00000001 00f61850 00f62ba8 deadlock2!main+0x57 [c:\david\dev\nitests\deadlock2\deadlock2.cpp @ 60]
0035f7c8 7509339a 7efde000 0035f814 77289ef2 deadlock2!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
0035f7d4 77289ef2 7efde000 7732789a 00000000 KERNEL32!BaseThreadInitThunk+0xe
0035f814 77289ec5 01101731 7efde000 00000000 ntdll_77250000!__RtlUserThreadStart+0x70
0035f82c 00000000 01101731 7efde000 00000000 ntdll_77250000!_RtlUserThreadStart+0x1b

second thread:

ChildEBP RetAddr  Args to Child              
0276e618 77288df4 0000021c 00000000 00000000 ntdll_77250000!NtWaitForSingleObject+0x15
0276e67c 77288cd8 00000000 00000000 72c83b4e ntdll_77250000!RtlpWaitOnCriticalSection+0x13e
0276e6a4 72c42f2a 72cbbab8 0276e978 0276e6ec ntdll_77250000!RtlEnterCriticalSection+0x150
0276e6b4 72c48a70 00000007 bd23bbe2 00f69170 MSVCR90!_lock+0x30
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for mxsutils.dll - 
0276e6ec 1b4fc523 0276e700 00000105 0276e978 MSVCR90!_getcwd+0x13
WARNING: Stack unwind information not available. Following frames may be wrong.
0276e80c 1b4fd45c 0276e81c 0276e9a4 1b5009e7 mxsutils!mxsCheckComponent+0x67c3
0276e838 1b515a05 0276e9a4 0276e978 00000000 mxsutils!mxsCheckComponent+0x76fc
0276e994 1b51542a 0276e9a4 00f619c0 0276e9b0 mxsutils!std::_Init_locks::operator=+0x1a4f
0276e9d8 1b502aa0 0276eaf0 00f69170 00f619c0 mxsutils!std::_Init_locks::operator=+0x1474
0276eb54 1b4f163f 00000001 00000001 006b0300 mxsutils!CodeProject3rdParty::mxs_mxExceptionFilter+0x320
0276eba8 1b502831 1b240000 1b529f40 00000001 mxsutils+0x163f
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for mxs.dll - 
0276ebc8 1b2414e9 00000001 00000000 00f62540 mxsutils!CodeProject3rdParty::mxs_mxExceptionFilter+0xb1
0276edf8 1b24593c 1b240000 00000001 00000000 mxs+0x14e9
0276ee3c 1b2459f6 1b240000 0276ee68 77289950 mxs!std::_Init_locks::operator=+0x44c
0276ee48 77289950 1b240000 00000001 00000000 mxs!std::_Init_locks::operator=+0x506
0276ee68 7728d8c9 1b2459d8 1b240000 00000001 ntdll_77250000!LdrpCallInitRoutine+0x14
0276ef5c 7728d78c 00000000 75717046 00000000 ntdll_77250000!LdrpRunInitializeRoutines+0x26f
0276f0c8 7728c4d5 0276f12c 0276f0f4 00000000 ntdll_77250000!LdrpLoadDll+0x4d1
0276f100 751a2288 0276f0f4 0276f144 0276f12c ntdll_77250000!LdrLoadDll+0xaa
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for nidmxfu.dll - 
0276f13c 6dd8b3ad 00000000 00000000 006b03fc KERNELBASE!LoadLibraryExW+0x1f1
0276f568 6dd8b4d3 0276f86c 6ded4a38 0276f92c nidmxfu!nNIMSAI100::tFilterPreferences::~tFilterPreferences+0x65cd
0276f584 6dc39d62 0276f6cc 0276f728 0276f870 nidmxfu!nNIMSAI100::tFilterPreferences::~tFilterPreferences+0x66f3
00000000 00000000 00000000 00000000 00000000 nidmxfu!nNIMS100::tAttributeDatabase::getAttributeValueForString+0x12432

My guess is that the main thread has locked an internal lock in the MSVCRT and then went to load a DLL which it cannot because thread 2 has a DLL loader lock. Thread 2 tries to use getcwd() from the MSVCRT which then results in the deadlock. Is that an accurate assessment? If not, how could I go about getting more information to make sure?

I could probably work around it by re-ordering some of the code (e.g. use wxDateTime or the NI code from the main thread to pre-load the DLLs) if I was confident that was the problem. However, I don't want to just hide it and have it re-appear and bite me later.

So is there a way for me to verify what has caused the deadlock in this case?

4

2 回答 2

3

Your diagnosis is correct. tzset holds a lock while calling LoadLibrary. Meanwhile, _getcwd is waiting for that same lock. mxsutils is calling _getcwd from inside its DllMain. Like most functions, _getcwd is not safe to call from DllMain. A temporary workaround would be to make a dummy call to localtime from main before you create any threads. A long-term fix would be to change msxutils so it doesn't call unsafe functions from inside DllMain.

于 2012-08-13T15:58:43.863 回答
1

I notice that you are calling wxDateTime::Now() without doing any initialization of the wxWidgets system. My guess would be that wxDateTime::Now() is relying on something that is initialized when you do a normal wxWidgets initialization. Have you tried not even starting your other thread, but simply checking that wxDateTime::Now() works OK like this?

I also notice that you are using wxWidgets v2.9.2. Recommend you upgrade to v2.9.4. Apart from many improvements that might help your situation, these is a fix to a bug in wxDateTime. Might not help with current problem, but will fix a problem you don't yet know you have

于 2012-08-13T12:27:42.173 回答