akka - IIS回收AppPool后节点有时不加入Akka.Net集群

Question

我们为短信、电子邮件和推送通知创建了一个 Akka 集群基础设施。系统中存在三种不同类型的节点，分别是客户端、发送者和灯塔。Web 应用程序和 API 应用程序正在使用客户端角色（Web 和 API 托管在 IIS 上）。Lighthouse 和 Sender 角色作为 Windows 服务托管。考虑到 Web 应用程序和 API 应用程序 AppPools 因 IIS 被回收，在 global.asax.cs 的 Start 和 Stop 事件中，我们关闭了 Client 角色中的 Actor 系统并重新启动。我们可以通过日志观察到系统成功关闭并加入集群。

但有时，当 AppPool 回收时，客户端 ActorSystem 启动但无法加入集群，我们的 Notification 停止工作（这对我们来说是个大问题）。当我们手动关闭 ActorSystem 并手动使其再次工作时，它会加入集群。这种情况大约每两天发生一次。

我们可以观察到Client在Error之前加入了Cluster；

节点 [akka.tcp://NotificationSystem@ 。. . :41350] 正在加入，角色 [client]
领导者正在移动节点 [akka.tcp://NotificationSystem@ 。. . :41350] 到 [向上]

通过查看日志，我们可以看到客户端加入集群后出现以下错误；

关闭地址：akka.tcp://NotificationSystem@ 。. . :41350Akka.Remote.ShutDownAssociation：关闭地址：akka.tcp://NotificationSystem@ 。. .：41350 ---> Akka.Remote.Transport.InvalidAssociationException：远程系统终止关联，因为它正在关闭。--- 内部异常堆栈跟踪结束 --- 在 Akka.Remote.EndpointWriter.b__20_0(Exception ex) 在 Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level) 在 Akka.Actor.LocalOnlyDecider.Decide(Exception cause ) at Akka.Actor.OneForOneStrategy.Handle(IActorRef child, Exception x) at Akka.Actor.SupervisorStrategy.HandleFailure(ActorCell actorCell, 异常原因, ChildRestartStats failedChildStats, IReadOnlyCollection1 allChildren) at Akka.Actor.ActorCell.HandleFailed(Failed f) at Akka.Actor.ActorCell.SystemInvoke（信封信封）---堆栈跟踪从先前抛出异常的位置结束---在 Akka.Actor 的 Akka.Actor.ActorCell.HandleFailed(Failed f)。. . .：41350 ---> Akka.Remote.Transport.InvalidAssociationException：远程系统终止了关联，因为它正在关闭。--- 内部异常堆栈跟踪结束 --- 在 Akka.Remote.EndpointWriter.b__20_0(Exception ex) 在 Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level) 在 Akka.Actor.LocalOnlyDecider.Decide(Exception cause ) 在 Akka.Actor.OneForOneStrategy.Handle(IActorRef child, Exception x) 在 Akka.Actor.SupervisorStrategy.HandleFailure(ActorCell actorCell, 异常原因, ChildRestartStats failedChildStats, IReadOnlyCollection`1 allChildren) 在 Akka.Actor.ActorCell.HandleFailed(Failed f ）在 Akka.Actor.ActorCell.SystemInvoke（信封信封）---从先前抛出异常的位置结束堆栈跟踪---在 Akka.Actor 的 Akka.Actor.ActorCell.HandleFailed(Failed f)。

错误后，我们看到以下错误消息；

与 [akka.tcp://NotificationSystem@ 的关联。. . :41350] 具有 UID [226948907] 的失败是不可恢复的。UID 现在已被隔离，所有发往此 UID 的邮件都将被发送到死信。必须重新启动远程 actorsystem 才能从这种情况中恢复。

如果不重新启动客户端参与者，系统不会自行纠正。

我们的客户角色配置是；

<akka>
<hocon>
    <![CDATA[
        akka{
            loglevel = DEBUG

            actor{
                provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"

                deployment {
                    /coordinatorRouter {
                        router = round-robin-group
                        routees.paths = ["/user/NotificationCoordinator"]
                        cluster {
                                enabled = on
                                max-nr-of-instances-per-node = 1
                                allow-local-routees = off
                                use-role = sender
                        }
                    }                
                }

                serializers {
                    wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
                }

                serialization-bindings {
                 "System.Object" = wire
                }

                debug{
                    receive = on
                    autoreceive = on
                    lifecycle = on
                    event-stream = on
                    unhandled = on
                }
            }

            remote {
                helios.tcp {
                        transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
                        applied-adapters = []
                        transport-protocol = tcp
                        hostname = "***.***.**.**"
                        port = 0
                }
            }

            cluster {
                    seed-nodes = ["akka.tcp://NotificationSystem@***.***.**.**:5053", "akka.tcp://NotificationSystem@***.***.**.**:5073"]
                    roles = [client]
            }
        }
    ]]>
</hocon>

我们的发件人角色配置是；

  <akka>
<hocon><![CDATA[
            akka{
                loglevel = INFO

                loggers = ["Akka.Logger.NLog.NLogLogger, Akka.Logger.NLog"]

                actor{
                    debug {  
                        # receive = on 
                        # autoreceive = on
                        # lifecycle = on
                        # event-stream = on
                        # unhandled = on
                    }         

                    provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"           

                    serializers {
                        wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
                    }

                    serialization-bindings {
                     "System.Object" = wire
                    }

                    deployment{
                        /NotificationCoordinator/ApplePushNotificationActor{
                            router = round-robin-pool
                            resizer{
                                enabled = on
                                lower-bound = 3
                                upper-bound = 5
                            }
                        }

                        /NotificationCoordinator/AndroidPushNotificationActor{
                            router = round-robin-pool
                            resizer{
                                enabled = on
                                lower-bound = 3
                                upper-bound = 5
                            }
                        }

                        /NotificationCoordinator/EmailActor{
                            router = round-robin-pool
                            resizer{
                                enabled = on
                                lower-bound = 3
                                upper-bound = 5
                            }
                        }

                        /NotificationCoordinator/SmsActor{
                            router = round-robin-pool
                            resizer{
                                enabled = on
                                lower-bound = 3
                                upper-bound = 5
                            }
                        }

                        /NotificationCoordinator/LoggingCoordinator/ResponseLoggerActor{
                            router = round-robin-pool
                            resizer{
                                enabled = on
                                lower-bound = 3
                                upper-bound = 5
                            }
                        }                           
                    }
                }

             remote{                            
                        log-remote-lifecycle-events = DEBUG
                        log-received-messages = on

                        helios.tcp{
                            transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
                            applied-adapters = []
                            transport-protocol = tcp
                            #will be populated with a dynamic host-name at runtime if left uncommented
                            #public-hostname = "POPULATE STATIC IP HERE"
                            hostname = "***.***.**.**"
                            port = 0
                    }
                }

                cluster {
                        seed-nodes = ["akka.tcp://NotificationSystem@***.***.**.**:5053", "akka.tcp://NotificationSystem@***.***.**.**:5073"]
                        roles = [sender]
                }
            }
        ]]></hocon>

我们如何解决这个问题？谢谢你。

score 2 · Accepted Answer

这绝对是 Akka.Remote 中 EndpointManager 的一个错误。Akka.NET 1.1 - 将于 6 月 14 日发布，应该会解决这个问题。我们已经按照这些思路修复了大量集群重新加入错误，但它们尚未发布。Akka.Cluster 将作为该版本的一部分进行 RTM 编辑。

同时，如果您现在想尝试新的位，也可以尝试使用Akka.NET Nightly Builds。

akka - IIS回收AppPool后节点有时不加入Akka.Net集群

1 回答 1

Related

Reference