6

几周以来,我面临着一个巨大的问题。我有一个托管在 IIS7 (W2008 SP1) 下的 asp.net 应用程序,每隔几个小时它就会开始消耗近 50% 的 CPU,而可能没有用户连接。这是可以理解的,因为我们正在使用 Quartz.net 进行一些应用程序回收,但我们还无法重现该问题。

这是在 CPU 较高时使用 JetBrains dotTrace 3.1 进行的跟踪:http: //mycenter.info/tmp/DotTraceSnapshot.zip

通常浪费 CPU 的进程是 w3wp.exe,但最近几天 sqlserver (2008) 和 memcached (1.2.1,周一更新到 1.2.4 beta) 也在消耗 CPU。奇怪的是,有时 memcached 开始消耗 100% 并且它的统计数据显示它很安静,但是在发出请求时它工作正常。

这是 w3wp 的崩溃转储(或堆栈跟踪转储),使用 WinDbg:(基于本指南:http: //blogs.technet.com/marcelofartura/archive/2006/09/15/troubleshooting-iis-100-cpu -issues-step-by-step-intermediary.aspx )

0:000> ~
.  0  Id: 1be4.1d3c Suspend: 1 Teb: 7ffdf000 Unfrozen
   1  Id: 1be4.b1c Suspend: 1 Teb: 7ffde000 Unfrozen
   2  Id: 1be4.12a0 Suspend: 1 Teb: 7ffdd000 Unfrozen
   3  Id: 1be4.19d0 Suspend: 1 Teb: 7ffdc000 Unfrozen
   4  Id: 1be4.1714 Suspend: 1 Teb: 7ffd7000 Unfrozen
   5  Id: 1be4.1a18 Suspend: 1 Teb: 7ffd6000 Unfrozen
   6  Id: 1be4.12ac Suspend: 1 Teb: 7ffd5000 Unfrozen
   7  Id: 1be4.dec Suspend: 1 Teb: 7ffd4000 Unfrozen
   8  Id: 1be4.1e48 Suspend: 1 Teb: 7ffd8000 Unfrozen
   9  Id: 1be4.1ca8 Suspend: 1 Teb: 7ffd3000 Unfrozen
  10  Id: 1be4.1508 Suspend: 1 Teb: 7ffaf000 Unfrozen
  11  Id: 1be4.1bc0 Suspend: 1 Teb: 7ffae000 Unfrozen
  12  Id: 1be4.1f48 Suspend: 1 Teb: 7ffad000 Unfrozen
  13  Id: 1be4.1994 Suspend: 1 Teb: 7ffac000 Unfrozen
  14  Id: 1be4.1a48 Suspend: 1 Teb: 7ffab000 Unfrozen
  15  Id: 1be4.12c8 Suspend: 1 Teb: 7ffa8000 Unfrozen
  16  Id: 1be4.e44 Suspend: 1 Teb: 7ffa7000 Unfrozen
  17  Id: 1be4.19e0 Suspend: 1 Teb: 7ffa6000 Unfrozen
  18  Id: 1be4.19b0 Suspend: 1 Teb: 7ffa2000 Unfrozen
  19  Id: 1be4.1b30 Suspend: 1 Teb: 7ffd9000 Unfrozen
  20  Id: 1be4.1bfc Suspend: 1 Teb: 7ffa3000 Unfrozen
  21  Id: 1be4.1be8 Suspend: 1 Teb: 7ffa1000 Unfrozen
  22  Id: 1be4.1a54 Suspend: 1 Teb: 7ffa5000 Unfrozen
  23  Id: 1be4.b74 Suspend: 1 Teb: 7ff3d000 Unfrozen
  24  Id: 1be4.19b4 Suspend: 1 Teb: 7ff3c000 Unfrozen
  25  Id: 1be4.1460 Suspend: 1 Teb: 7ffdb000 Unfrozen
  26  Id: 1be4.1eac Suspend: 1 Teb: 7ffaa000 Unfrozen
  27  Id: 1be4.1b90 Suspend: 1 Teb: 7ffa4000 Unfrozen


0:023> #23s
Search address set to 77dc9a94
*** WARNING: Unable to verify checksum for SMDiagnostics.ni.dll
*** WARNING: Unable to verify checksum for System.Data.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for Microsoft.Web.Services3.DLL
*** WARNING: Unable to verify checksum for System.Windows.Forms.ni.dll
*** WARNING: Unable to verify checksum for System.Web.ni.dll
*** WARNING: Unable to verify checksum for Ademy.UI.Web.DLL
*** ERROR: Module load completed but symbols could not be loaded for AjaxControlToolkit.DLL
*** ERROR: Module load completed but symbols could not be loaded for 7zSharp.DLL
*** WARNING: Unable to verify checksum for mscorlib.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for Iesi.Collections.DLL
*** WARNING: Unable to verify checksum for System.Design.ni.dll
*** WARNING: Unable to verify checksum for System.Core.ni.dll
*** WARNING: Unable to verify checksum for Ademy.Event.DLL
*** WARNING: Unable to verify checksum for System.ServiceModel.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for System.ServiceModel.ni.dll
*** WARNING: Unable to verify checksum for App_Theme_Ocean.wgubmrqt.dll
*** WARNING: Unable to verify checksum for NHibernate.Burrow.AppBlock.DLL
*** ERROR: Module load completed but symbols could not be loaded for NHibernate.Burrow.AppBlock.DLL
*** WARNING: Unable to verify checksum for NHibernate.Caches.SysCache2.DLL
*** ERROR: Module load completed but symbols could not be loaded for NHibernate.Caches.SysCache2.DLL
*** WARNING: Unable to verify checksum for Ademy.UI.Web.Controls.DLL
*** WARNING: Unable to verify checksum for Microsoft.JScript.ni.dll
*** WARNING: Unable to verify checksum for System.Web.Mobile.ni.dll
*** WARNING: Unable to verify checksum for System.Runtime.Serialization.ni.dll
          ^ Memory access error in '#23s'

0:023> kb
ChildEBP RetAddr  Args to Child             
11c6ede4 77dc8ed4 766bc622 0000038c 00000000 ntdll!KiFastSystemCallRet
11c6ede8 766bc622 0000038c 00000000 11c6ee20 ntdll!NtSetEvent+0xc
11c6edf8 011011ef 0000038c 7f52be6e 0fda4888 kernel32!SetEvent+0x10
WARNING: Frame IP not in any known module. Following frames may be wrong.
11c6ee20 71b26ffe 060c5f9c 010039b0 010628a0 0x11011ef
*** WARNING: Unable to verify checksum for System.ni.dll
11c6ee4c 712c4b14 02528958 060c5f9c 11c6ee94 mscorlib_ni+0x216ffe
11c6ee5c 712c4abe 060c5fb0 02528958 060c600c System_ni+0x144b14
11c6ee94 71679260 060c5d24 7167926d 060c5d24 System_ni+0x144abe
11c6eec8 717d8373 060c5d24 11c6f3e8 712c4ce4 System_ni+0x4f9260
11c6ef14 712c4ce4 00000000 02528930 11c6ef74 System_ni+0x658373
11c6ef54 7129dbcb 098b6ac4 11c6efec 72f7eff8 System_ni+0x144ce4
11c6efa4 71b26d66 02df349c 11c6efc0 71b45681 System_ni+0x11dbcb
11c6efb0 71b45681 00000000 0dcfd2d8 11c6efd0 mscorlib_ni+0x216d66
11c6efc0 72f11b4c 766b45f1 00000000 11c6f050 mscorlib_ni+0x235681
11c6efd0 72f221f9 11c6f0a0 00000000 11c6f070 mscorwks!CallDescrWorker+0x33
11c6f050 72f36571 11c6f0a0 00000000 11c6f070 mscorwks!CallDescrWorkerWithHandler+0xa3
11c6f194 72f365a4 71a91ff0 11c6f2c8 11c6f1e8 mscorwks!MethodDesc::CallDescr+0x19c
11c6f1b0 72f365c2 71a91ff0 11c6f2c8 11c6f1e8 mscorwks!MethodDesc::CallTargetWorker+0x1f
11c6f1c8 7302a471 11c6f1e8 68e9b644 0dcfd2d8 mscorwks!MethodDescCallSite::CallWithValueTypes+0x1a
11c6f394 7302a5c6 11c6f424 68e9b194 02df34e4 mscorwks!ExecuteCodeWithGuaranteedCleanupHelper+0x9f
11c6f444 71b45577 11c6f3e8 02df17d0 01c177f8 mscorwks!ReflectionInvocation::ExecuteCodeWithGuaranteedCleanup+0x10f

提前感谢任何提示!

更新:

这是挂起线程的托管堆栈:我认为它看起来像 memcached 提供程序,但还不确定我应该做什么。

0:023> !clrstack
OS Thread Id: 0xb74 (23)
ESP       EIP     
11c6ee38 77dc9a94 [NDirectMethodFrameStandaloneCleanup: 11c6ee38] Microsoft.Win32.Win32Native.SetEvent(Microsoft.Win32.SafeHandles.SafeWaitHandle)
11c6ee48 71b26ffe System.Threading.EventWaitHandle.Set()
11c6ee54 712c4b14 System.Net.TimerThread.Prod()
11c6ee64 712c4abe System.Net.TimerThread+TimerQueue.CreateTimer(Callback, System.Object)
11c6eea0 71679260 System.Net.ConnectionPool.CleanupCallbackWrapper(Timer, Int32, System.Object)
11c6eed4 717d8373 System.Net.TimerThread+TimerNode.Fire()
11c6ef1c 712c4ce4 System.Net.TimerThread+TimerQueue.Fire(Int32 ByRef)
11c6ef5c 7129dbcb System.Net.TimerThread.ThreadProc()
11c6efac 71b26d66 System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
11c6efb8 71b45681 System.Threading.ExecutionContext.runTryCode(System.Object)
11c6f3e8 72f11b4c [HelperMethodFrame_PROTECTOBJ: 11c6f3e8] System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode, CleanupCode, System.Object)
11c6f450 71b45577 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
11c6f46c 71b301c5 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
11c6f484 71b26ce4 System.Threading.ThreadHelper.ThreadStart()
11c6f6b0 72f11b4c [GCFrame: 11c6f6b0] 
11c6f9a0 72f11b4c [ContextTransitionFrame: 11c6f9a0] 

找到的解决方案:

这是由于在 Windows 2008 上运行时用于 Win32 的 memcached 1.2.1 中的一个错误。我更新到 v1.2.6 并且一切正常。我想我看到的是 w3wp 进程,因为我用来连接到 memcached 的库有一个正在挂起的回收进程,即使 memcached 仍在响应。

找到解决方案 2:

如果第一个解决方案不起作用,请阅读这篇文章。我猜 memcached 解决方案只是隐藏了真正的问题,这是 SmtpClient 中的一个错误。

4

2 回答 2

2

在windbg中,问题:

~*e !clrstack

这将转储所有托管线程堆栈,并且应该让您了解该进程中发生的情况。

也可以试试 !runaway,它会告诉你每个线程已经运行了多少时间。专注于运行时间最长的顶部线程的堆栈。

于 2009-09-23T19:04:58.060 回答
0

这可能是由缓存问题引起的吗?例如,您是否将缓存数据集设置为在过期时自动从数据库重新加载?

我们曾经遇到过这种情况。我们有一个庞大的数据集,我们希望它始终可用。数据并没有经常更改,因此我们将其设置在缓存中,有效期为 1 小时,然后在我们的 global.asax 中,我们处理了删除(如​​此处所述,没有注意链接中描述的警告。我们一个小时过去后将数据集重新加载到缓存中,这导致每小时 CPU 使用率和数据库使用率都很高。

编辑 - 添加

不用说,我们很快就看到了这一点,并从我们的错误中吸取了教训。

于 2009-09-23T19:07:00.637 回答