0

我有一个 .net 4.5 ASP.NET WebAPI 应用程序。在具有 4 个 CPU 的 8gig VM 上使用 1 个工作人员在 IIS 中部署。

我最近对其进行了更改(升级了 ServiceStack.Interfaces、ServiceStack.Common、ServiceStack.Redis 和一堆依赖项)并开始注意到部署此应用程序的 IIS 应用程序池大约每小时回收一次(几分钟或几分钟)。

我的应用程序日志中没有任何内容显示任何类型的问题。我使用 telegraf 收集指标,但我根本没有看到内存指标增加,就我查看的所有指标而言,一切看起来绝对正常,然后应用程序池回收。

我查看了事件查看器并按 WAS 源过滤了日志,并查看了 ID 为 5011 的事件。据我所知,这基本上意味着 IIS 工作程序崩溃。

因此,我使用了 DebugDiag 并在我的本地机器上运行它,并将应用程序部署在我的机器上(我可以在本地重现该问题)。它运行了一段时间,最后在事件查看器中得到了相同的事件。查看了来自 DebugDiag 的崩溃分析日志,我看到的只是一堆异常记录,但在崩溃之前没有任何具体内容。

在这一点上,我不完全确定我还能做些什么来找出导致崩溃的原因,所以希望有更多关于我可以做些什么来获得更多透明度的建议。

我认为正在发生的事情是,与我的一个依赖项和一些升级的软件包存在一些不兼容,这会导致抛出异常,该异常不会被任何东西处理并使 IIS 工作程序崩溃。

我的应用程序运行良好,所有 API 端点功能都没有问题,内存没有增加,CPU 很好。据我所知,崩溃之前没有任何问题。

想知道是否有人知道任何技巧来查找导致崩溃的原因和/或处理它,防止此异常逃逸并使工作人员崩溃。

4

1 回答 1

0

I was able to narrow down with some confidence that the issue lies somewhere within the ServiceStack.Redis RedisPubSubServer. What is the actual issue, I don't know as that would take a lot more time to dig and I've wasted too much time already.

However, piggybacking on some existing code I had (from before ServiceStack supported sentinel) I created a new implementation of the redis client wrapper for the which I call LazySentinelServiceStackClientWrapper; instead of using the built-in sentinel manager, it relies on a custom sentinel provider which I created LazySentinelApiSentinelProvider this implementation attempts to interrogate the available sentinel hosts in random order for master and slave nodes and then I construct a pool using the retrieved read/write and readonly hosts and this pool is used to run the redis operations. The pool is refreshed whenever an error occurs (after a failover). Opposed to the builtin sentinel manager that comes with ServiceStack.Redis which instantiates Redis pubsub server and listens for messages from sentinel whenever configuration changes such as fail-overs occur and updates the managed redis connection pool.

I installed my version of this redis client wrapper into my application has seen no app pool recycle events since (other than the scheduled ones).

enter image description here

Above is the log of app pool recycle events before I disabled the ServiceStack.Redis sentinel manager.

And here's the log of app pool recycle events after installing my new lazy sentinel manager

enter image description here

The first spike is me recycling the app manually and second one is the scheduled 1am recycle. So clearly the issue is solved.

What is the actual reason why the sentinel manager via redis pub sub server is causing IIS rapid fail protection to fire and recycle the app pool I do not know. Maybe someone with much more redis experience and/or IIS experience can attest to that. Also I did not test this in .net core and only tested for a .net 4.5.1 application deployed in IIS but on many different machines including local development machine and beefy production machines.

Finally one last note, that first image which shows all the recycle events, that's on my CI machine which is barely taking any traffic, maybe 1 request every few minutes. So this means the issue is not some memory leak or some resource exhaustion. Whatever the issue is, it happens regardless of traffic, CPU load, memory load, it just happens periodically.

Needless to say I will not be using the builtin sentinel manager at least for now.

于 2019-12-22T01:09:10.943 回答