我有这个应用程序使用 Akka.net 集群功能。编写代码的人已经离开了公司。我正在尝试理解代码,我们正在计划部署。
集群有 2 种类型的节点
QueueServicer:支持分片,只有这些节点才能参与分片。
LightHouse:它们只是种子节点,没有别的。
灯塔:2 个节点
QueueServicer:3 个节点
我看到 QueueServicer 节点之一无法加入集群。两个灯塔节点都拒绝连接。它不断尝试加入,但从未成功。这在过去 5 天左右一直在发生,并且节点也永远不会死亡。它的 CPU 和内存使用率很高。此外,当通过日志过滤搜索时,它没有运行任何队列处理器参与者。垃圾收集等需要很长时间。我在此节点的日志中看到以下内容。
{"timestamp":"2021-09-08T22:26:59.025Z", "logger":"Akka.Event.DummyClassForStringSources", "message": 尝试关联无法访问的远程地址 [akka.tcp://myapp@灯塔-1:7892]。地址现在被门控 5000 毫秒,所有到该地址的消息都将被传递到死信。原因:[与 akka.tcp://myapp@lighthouse-1:7892 关联失败] 原因:[System.AggregateException:发生了一个或多个错误。(连接被拒绝 akka.tcp://myapp@lighthouse-1:7892)---> Akka.Remote.Transport.InvalidAssociationException:连接被 Akka.Remote.Transport 拒绝 akka.tcp://myapp@lighthouse-1:7892 .DotNetty.TcpTransport.AssociateInternal(Address remoteAddress) 在 Akka.Remote.Transport.DotNetty.DotNettyTransport。1.GetResultCore(Boolean waitCompletionNotification) at Akka.Remote.Transport.ProtocolStateActor.<>c.<InitializeFSM>b__12_18(Task
1 个结果)在 System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() 在 System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
{"timestamp":"2021-09-08T22:26:59.025Z", "logger":"Akka.Event.DummyClassForStringSources", "message": 尝试关联无法访问的远程地址 [akka.tcp://myapp@灯塔-0:7892]。地址现在被门控 5000 毫秒,所有到该地址的消息都将被传递到死信。原因:[与 akka.tcp://myapp@lighthouse-0:7892 关联失败] 原因:[System.AggregateException:发生了一个或多个错误。(连接被拒绝 akka.tcp://myapp@lighthouse-0:7892)---> Akka.Remote.Transport.InvalidAssociationException:连接被拒绝 akka.tcp://myapp@lighthouse-0:7892 在 Akka.Remote.Transport .DotNetty.TcpTransport.AssociateInternal(Address remoteAddress) 在 Akka.Remote.Transport.DotNetty.DotNettyTransport。1.GetResultCore(Boolean waitCompletionNotification) at Akka.Remote.Transport.ProtocolStateActor.<>c.<InitializeFSM>b__12_18(Task
1 个结果)在 System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() 在 System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
还有其他“正在监督”、“停止”、“开始”的日志,我在这里省略了。
您能否验证一下 HCON 配置对于裂脑解析器和分片是否正确?
我认为 LightHouse/SeeNodes 不应该指定分片配置。我认为这是一个错误。我还认为,LightHouse/SeedNodes 中的裂脑解析器配置可能是错误的,不应为种子节点指定。
我感谢您的帮助。
这是 QueueServicer Trimmed 的 HOCON
akka { loggers
= ["Akka.Logger.log4net.Log4NetLogger, Akka.Logger.log4net"]
log-config-on-start = on
loglevel = "DEBUG"
actor {
provider = cluster
serializers {
hyperion = "Akka.Serialization.HyperionSerializer , Akka.Serialization.Hyperion"
}
序列化绑定 {
"System.Object" = hyperion
}
}
remote {
dot-netty.tcp {
….
}
}
cluster {
seed-nodes = ["akka.tcp://myapp@lighthouse-0:7892",akka.tcp://myapp@lighthouse-1:7892"]
roles = ["QueueProcessor"]
sharding {
role = "QueueProcessor"
state-store-mode = ddata
remember-entities = true
passivate-idle-entity-after = off
}
downing-provider-class = "Akka.Cluster.SplitBrainResolver, Akka.Cluster"
split-brain-resolver {
active-strategy = keep-majority
stable-after = 20s
keep-majority {
role = "QueueProcessor"
}
}
down-removal-margin = 20s
}
extensions = ["Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubExtensionProvider,Akka.Cluster.Tools"]
}
这是 Lighthouse 的 HOCON
akka { loggers
= ["Akka.Logger.log4net.Log4NetLogger, Akka.Logger.log4net"]
log-config-on-start = on
loglevel = "DEBUG"
actor {
provider = cluster
serializers {
hyperion = "Akka.Serialization.HyperionSerializer , Akka.Serialization.Hyperion"
}
序列化绑定 {
"System.Object" = hyperion
}
}
remote {
dot-netty.tcp {
…
}
}
cluster {
seed-nodes = ["akka.tcp://myapp@lighthouse-0:7892",akka.tcp://myapp@lighthouse-1:7892"]
roles = ["lighthouse"]
sharding {
role = "lighthouse"
state-store-mode = ddata
remember-entities = true
passivate-idle-entity-after = off
}
downing-provider-class = "Akka.Cluster.SplitBrainResolver, Akka.Cluster"
split-brain-resolver {
active-strategy = keep-oldest
stable-after = 30s
keep-oldest {
down-if-alone = on
role = "lighthouse"
}
}
}
}