1

我们的 .NET 应用程序随机崩溃并出现 ExecutionEngineException 错误。应用面向 .NET 4.8 x64。

经过一些密集的复制尝试,我收集了以下事实。

  1. 我们使用带有 Time Travel 功能的 WinDBG Preview 来捕获执行历史记录。
  2. 我们分析了一些使用 AdPlus 收集的应用程序崩溃,其中断点设置为:

clr!EEPolicy::HandleFatalError

  1. 应用程序使用这些环境变量集执行:
COMPLUS_HeapVerify = 1 
COMPlus_GCStress = 16
  1. 作为额外的压力测试,我们添加了定期(每 10 秒一次)LOH 压缩:
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, false);

GC.WaitForPendingFinalizers();

GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, true);

GC.WaitForPendingFinalizers();
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.Default;
  1. 应用程序崩溃调用堆栈在所有情况下都如下所示:
000000d4`997b6718 00007ff8`9d82a290 clr!EEPolicy::HandleFatalError+0x0
000000d4`997b6720 00007ff8`9d9fa1dc clr!VerifyObjectAndAge+0x8c
000000d4`997b6750 00007ff8`9d9f1f1c clr!GCToEEInterface::WalkAsyncPinned+0x83
000000d4`997b6790 00007ff8`9d9fa07e clr!BlockVerifyAgeMapForBlocksWorker+0x76
000000d4`997b67d0 00007ff8`9d9f9fe5 clr!BlockVerifyAgeMapForBlocks+0x45
000000d4`997b6800 00007ff8`9d3cedcf clr!TableScanHandles+0x1ff
000000d4`997b68d0 00007ff8`9d9f2507 clr!HndVerifyTable+0xcb
000000d4`997b6970 00007ff8`9d94b2d4 clr!Ref_VerifyHandleTable+0xb4
000000d4`997b6a00 00007ff8`9d94a57f clr!WKS::gc_heap::verify_heap+0x7f9
000000d4`997b6b20 00007ff8`9d760f7e clr!WKS::gc_heap::garbage_collect+0x37231e
000000d4`997b6b60 00007ff8`9d3f0c37 clr!WKS::GCHeap::GarbageCollectGeneration+0xef
000000d4`997b6bb0 00007ff8`9d53ae61 clr!WKS::GCHeap::GarbageCollect+0x91
000000d4`997b6c00 00007ff8`9d536d2d clr!GCInterface::Collect+0x6a
000000d4`997b6c90 00007ff8`9c7d39db mscorlib_ni+0xdb39db
000000d4`997b6d40 00007ff8`76fcfa6d Company_App_Processor_ni+0x50fa6d
000000d4`997bc8a0 00007ff8`3de6d8c9 unknown!unknown+0x0
000000d4`997bc950 00007ff8`76f300bb Company_App_Processor_ni+0x4700bb
000000d4`997bc9e0 00007ff8`7718edec Company_App_Processor_ni+0x6cedec
000000d4`997bcf90 00007ff8`3de6d8c9 unknown!unknown+0x0
000000d4`997bd040 00007ff8`7718a3e0 Company_App_Processor_ni+0x6ca3e0
000000d4`997bd0d0 00007ff8`77189483 Company_App_Processor_ni+0x6c9483
000000d4`997bd170 00007ff8`9d396923 clr!CallDescrWorkerInternal+0x83
000000d4`997bd1b0 00007ff8`9d396838 clr!CallDescrWorkerWithHandler+0x4e
000000d4`997bd1f0 00007ff8`9d4d654c clr!CallDescrWithObjectArray+0x705
000000d4`997bd460 00007ff8`9d4d6050 clr!CStackBuilderSink::PrivateProcessMessage+0x26d
000000d4`997bd900 00007ff8`9bf626d3 mscorlib_ni+0x5426d3
000000d4`997bd9b0 00007ff8`9bf6243d mscorlib_ni+0x54243d
000000d4`997bda00 00007ff8`9bf6226a mscorlib_ni+0x54226a
000000d4`997bda70 00007ff8`9bf61bc2 mscorlib_ni+0x541bc2
000000d4`997bdaf0 00007ff8`9d396923 clr!CallDescrWorkerInternal+0x83
000000d4`997bdb30 00007ff8`9d396838 clr!CallDescrWorkerWithHandler+0x4e
000000d4`997bdb70 00007ff8`9d419067 clr!DispatchCallDebuggerWrapper+0x1f
000000d4`997bdbd0 00007ff8`9d419035 clr!DispatchCallSimple+0x93
000000d4`997bdc70 00007ff8`9d4d4d6c clr!ThreadNative::InternalCrossContextCallback+0x34c
000000d4`997be040 00007ff8`9bf61dfc mscorlib_ni+0x541dfc
000000d4`997be0b0 00007ff8`9bf6c06a mscorlib_ni+0x54c06a
000000d4`997be110 00007ff8`9bf60eee mscorlib_ni+0x540eee
000000d4`997be160 00007ff8`9bf6b95b mscorlib_ni+0x54b95b
000000d4`997be1c0 00007ff8`9d396923 clr!CallDescrWorkerInternal+0x83
000000d4`997be200 00007ff8`9d396838 clr!CallDescrWorkerWithHandler+0x4e
000000d4`997be240 00007ff8`9d419067 clr!DispatchCallDebuggerWrapper+0x1f
000000d4`997be2a0 00007ff8`9d419035 clr!DispatchCallSimple+0x93
000000d4`997be340 00007ff8`9d4d4d6c clr!ThreadNative::InternalCrossContextCallback+0x34c
000000d4`997be710 00007ff8`9bf60d48 mscorlib_ni+0x540d48
000000d4`997be770 00007ff8`9bf60764 mscorlib_ni+0x540764
000000d4`997be7f0 00007ff8`9bf60644 mscorlib_ni+0x540644
000000d4`997be860 00007ff8`9bf6015a mscorlib_ni+0x54015a
000000d4`997be920 00007ff8`9bf5fcef mscorlib_ni+0x53fcef
000000d4`997be9e0 00007ff8`9d394c12 clr!CTPMethodTable__CallTargetHelper3+0x12
000000d4`997bea10 00007ff8`9d4a9cac clr!CallTargetWorker2+0x85
000000d4`997bea70 00007ff8`9d4d5016 clr!TransparentProxyStubWorker+0x2a3b6
000000d4`997bec70 00007ff8`9d394b55 clr!TransparentProxyStub_CrossContext+0x55
000000d4`997bed30 00007ff8`3de91e43 unknown!noop+0x0
000000d4`997bf130 00007ff8`9bf79bd1 mscorlib_ni+0x559bd1
000000d4`997bf170 00007ff8`9bf78e46 mscorlib_ni+0x558e46
000000d4`997bf210 00007ff8`9d396923 clr!CallDescrWorkerInternal+0x83
000000d4`997bf250 00007ff8`9d396838 clr!CallDescrWorkerWithHandler+0x4e
000000d4`997bf290 00007ff8`9d3970e8 clr!MethodDescCallSite::CallTargetWorker+0x102
000000d4`997bf390 00007ff8`9d39c10a clr!QueueUserWorkItemManagedCallback+0x2a
000000d4`997bf480 00007ff8`9d397ce0 clr!ManagedThreadBase_DispatchInner+0x40
000000d4`997bf4c0 00007ff8`9d397c53 clr!ManagedThreadBase_DispatchMiddle+0x6c
000000d4`997bf5c0 00007ff8`9d397b92 clr!ManagedThreadBase_DispatchOuter+0x4c
000000d4`997bf630 00007ff8`9d397d77 clr!ManagedThreadBase_FullTransitionWithAD+0x2f
000000d4`997bf690 00007ff8`9d39c057 clr!ManagedPerAppDomainTPCount::DispatchWorkItem+0xa4
000000d4`997bf810 00007ff8`9d3978a7 clr!ThreadpoolMgr::ExecuteWorkRequest+0x64
000000d4`997bf840 00007ff8`9d39777f clr!ThreadpoolMgr::WorkerThreadStart+0xf6
000000d4`997bf8e0 00007ff8`9d39b5c5 clr!Thread::intermediateThreadProc+0x8b
000000d4`997bfda0 00007ff8`ac4b13d2 kernel32!BaseThreadInitThunk+0x22
000000d4`997bfdd0 00007ff8`ae7354f4 ntdll!RtlUserThreadStart+0x34
  1. 在崩溃分析之后,我们观察到它在进行对象年龄验证时会崩溃。
  2. 我们调查了来自 CLR VM 和 SSCLI (handletablescan.cpp) 的可用资源。
  3. 应用程序使用 WCF ServiceHost 和 NamedPipes 通道。
  4. 它崩溃的原因如下:
    • 年龄验证启动 (VerifyObjectAndAge) 并在第 2 代上找到 OverlappedData 对象。此外,它的当前“clump”年龄似乎为 2(来自 BlockVerifyAgeMapForBlocksWorker 源),用作 minAge。
    • 然后找到m_userObject字段值,等于object[],设置在: System.ServiceModel.Channels.OverlappedContext..ctor() 通过调用 this.nativeOverlapped = this.overlapped.UnsafePack(completeCallback, this.bufferHolder);
    • 该 object[] 配置为: this.bufferHolder = new object[] { dummyBuffer };
    • dummyBuffer 在 OverlappedContext 中定义 private static byte[] dummyBuffer;
    • 它分配在CLR 堆的临时段上。
    • 然后调用: GCHeap::GetGCHeap()->WhichGeneration(obj);
      返回 0,因为 'dummyBuffer' 不在 GC 生成 0、1 或 2 上。
    • VerifyObjectAndAge 调用: EEPOLICY_HANDLE_FATAL_ERROR(COR_E_EXECUTIONENGINE) 因为 minAge(从丛中预期为 2),但 'dummyBuffer' 年龄报告为 0。

问: 为什么会这样?如何解决?

旁注:应用程序使用新的 AppDomain 加载其模块。如果应用程序尝试使用工作逻辑卸载和加载 AppDomain,则会更频繁地发生崩溃。但是,如果没有域重新加载操作,我们就会崩溃。此外,VerifyObjectAndAge 逻辑应该更频繁地执行,但仅在一段时间后才会失败。

非常感谢您的任何想法。

4

0 回答 0