我们有一个服务器和客户端,它们被实现为通过信号器进行通信。我们已经使用 SQL Server 实现了横向扩展,后来又使用了 Redis。当将大约 70 个客户端连接到服务器时,所有客户端都突然断开连接。以下是来自客户端的跟踪日志。
08:45:09.5481482 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({})
08:45:19.5763845 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({})
08:45:29.5868786 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({})
08:45:36.5945700 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({"C":"s-0,6052","M":[{"H":"PrintConnectorHub","M":"IsConnectionAlive","A":[true]}]})
08:45:36.6035962 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({"I":"96"})
08:45:39.6197277 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnMessage({})
08:45:53.2729622 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Connection Timeout Warning : Notifying user
08:45:53.2729622 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - OnConnectionSlow
08:45:59.9651515 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Connection Timed-out : Transport Lost Connection
08:45:59.9651515 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: LostConnection
08:45:59.9801625 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS: OnClose()
08:45:59.9821639 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - ChangeState(Connected, Reconnecting)
08:46:00.0612219 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - WS Connecting to: ws://172.20.22.58/signalr/reconnect?clientProtocol=1.4&transport=webSockets&connectionData=[{"Name":"PrintConnectorHub"}]&connectionToken=GTySuvds2p5gtlb3q2ul1JijPaQVJf42sv8fWFXea8OqlLWoDv1rSK%2BfDKUowaqJ4Nx9mAb6NAIkHwTpgNiJNjdSO3lr73qA7JyP%2B6Uxc2uVhhWUV8My5wRN5utsoF0k&messageId=s-0%2C6052
08:46:06.5803795 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - OnError(System.InvalidOperationException: Data cannot be sent because the WebSocket connection is reconnecting.)
08:46:30.0342252 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - OnError(System.TimeoutException: Couldn't reconnect within the configured timeout of 00:00:30, disconnecting.)
08:46:30.0342252 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Disconnected
08:46:30.0352263 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Transport.Dispose(41219ddc-af97-4738-b75e-07fe6bbd5472)
08:46:30.0362276 - 41219ddc-af97-4738-b75e-07fe6bbd5472 - Closed
08:46:36.5536103 - null - ChangeState(Disconnected, Connecting)
08:48:05.8383180 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - WS Connecting to: ws://172.20.22.58/signalr/connect?clientProtocol=1.4&transport=webSockets&connectionData=[{"Name":"PrintConnectorHub"}]&connectionToken=iPWCjYXC%2B81rTYDgV%2F4nFRmaU6ZQf56Y3Q8bRwHDbagwydw2dknALH8Lz6T0U8ISnwAycFQPEt5efnR5%2Bu%2FASvM%2B7AUX7%2FNB4eqPJrOpf7xxsvn4t%2BO2iLM3loZ9dUt5
08:48:10.8539527 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Auto: Failed to connect to using transport webSockets. System.TimeoutException: Transport timed out trying to connect
08:48:10.8539527 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - WS: OnClose()
08:48:10.8549540 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - SSE: GET http://172.20.22.58/signalr/connect?clientProtocol=1.4&transport=serverSentEvents&connectionData=[{"Name":"PrintConnectorHub"}]&connectionToken=iPWCjYXC%2B81rTYDgV%2F4nFRmaU6ZQf56Y3Q8bRwHDbagwydw2dknALH8Lz6T0U8ISnwAycFQPEt5efnR5%2Bu%2FASvM%2B7AUX7%2FNB4eqPJrOpf7xxsvn4t%2BO2iLM3loZ9dUt5
08:48:15.8721679 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Auto: Failed to connect to using transport serverSentEvents. System.TimeoutException: Transport timed out trying to connect
08:48:15.8751701 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - LP Connect: http://172.20.22.58/signalr/connect?clientProtocol=1.4&transport=longPolling&connectionData=[{"Name":"PrintConnectorHub"}]&connectionToken=iPWCjYXC%2B81rTYDgV%2F4nFRmaU6ZQf56Y3Q8bRwHDbagwydw2dknALH8Lz6T0U8ISnwAycFQPEt5efnR5%2Bu%2FASvM%2B7AUX7%2FNB4eqPJrOpf7xxsvn4t%2BO2iLM3loZ9dUt5
08:48:20.8729482 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Auto: Failed to connect to using transport longPolling. System.TimeoutException: Transport timed out trying to connect
08:48:20.8739493 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Disconnected
08:48:20.8739493 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Transport.Dispose(b4995d7f-c2bd-469a-a54b-4df02f198b9b)
08:48:20.8739493 - b4995d7f-c2bd-469a-a54b-4df02f198b9b - Closed
08:48:20.8749497 - null - There has not been an active server connection for an extended period of time. Stopping connection.
08:48:20.8749497 - null - OnError(System.TimeoutException: The client has been inactive since 5/8/2017 8:46:28 AM and it has exceeded the inactivity timeout of 00:00:50. Stopping the connection.)
更糟糕的是,当我们查看服务器机器资源监视器时,我们可以看到 w3wp.exe 分配了超过 1000 个线程,并且还在不断增加。
当我们调查 IIS 中的工作进程时,会显示大量的工作进程当前请求。
有一次,IIS 线程增加到 10,000 个线程并挂起服务器计算机。这个问题的原因是什么?
IIS 8.0 服务器 - Windows 2012