7

UPDATE: I've figured it out. See the end of this question.


I have an Azure App Service running four sites. One of the sites has two deployment slots in addition to the primary one. Recently I've been seeing really high CPU utilization for the App Service plan as a whole.

Percentage CPU shown in dark orange.

The dark orange line shows the CPU percentage. This is just after restarting all my sites, which brought it down to this level.

However, when I look at the CPU use reported by each site, it's really low.

enter image description here

The darker blue line shows the CPU time, which is basically nothing. I did this for all of my sites, and all the graphs look the same. Basically, it seems that none of my sites are causing the issue.

A couple of the sites have web jobs, so I took a look at the logs but everything is running fine there. The jobs run for a few seconds every few hours.

So my question is: how can I determine the source of this CPU utilization? Any pointers would be greatly appreciated.


UPDATE: Thanks to the replies below, I was able to get more detail into what was happening. I ended up getting what I needed from SCM / Kudu tools. You can get here by going to your web app in Azure and choosing Advanced Tools from the side nav. From the Kudu dashboard, choose Process Explorer. The value in the Total CPU Time column is not directly useful, because it's the time in seconds that the process has run since it started, which might have been minutes or days ago.

However, if you make a record of the value at intervals, you can look at the change over time, and one process might jump out at you. In my case, it was my WebJobs process. Every 60 seconds, this one process was consuming about 10 seconds of processor time, just within one environment.

The great thing about this Kudu dashboard is, if you can catch the problem while it is actually happening, you can hit the Start Profiling button and capture a diagnostic session. You can then open this up in Visual Studio and get some nice details about where the CPU time is being spent.

Just in case anyone else is seeing similar issues, I'll provide more details about my particular case. As I mentioned, my WebJobs exe was the culprit, and I found that all the CPU time was being spent in StackExchange.Redis.SocketManager, which manages connections to Azure Redis Cache. In my main web app, I create only one connection, as recommended. But Since my web jobs only run every once in a while, I was creating a new connection to Azure Redis Cache each time one ran, which apparently can lead to issues. I changed my code to create the Redis Cache connection once when the WebJob process starts up and use the existing connection when any individual WebJob runs.

Time will tell if this really fixes the issue, but I think it will. When the problem occurred, it always fit the same pattern: After a few days of running fine, my CPU would slowly ramp up over the course of about 12 hours. My thinking is that each time a WebJob ran, it created a connection object, which at first didn't produce trouble, but gradually as WebJobs ran every hour or two, cruft was building up until finally some critical threshold was met and the CPU usage would take off.

Hope this helps someone out there. Best wishes!

4

2 回答 2

4

可能你应该去webApp scm?

%yourAppName%.scm.azurewebsites.com

有一个页面可以显示所有流程,现在在您的网络应用程序上运行。(类似于控制台 > 进程)。

您也可以转到支持页面(从 scm 右上角)。您可以在那里找到有关您的性能的更多信息,并进行内存转储(不适用于此问题,但它对性能问题很有用)。

于 2017-03-17T05:29:42.510 回答
3

根据您的描述,我假设您可以利用Crash Diagnoser扩展程序在 CPU 使用百分比高于特定阈值时从您的 Web 应用程序和 WebJobs 捕获转储文件以隔离此问题。更详细的可以参考这个官方博客

于 2017-03-20T04:40:59.133 回答