通常,我们可以使用 Azure Service Fabric DNS 通过服务名称 ping 另一个服务。昨晚凌晨1点左右,这停止了工作。没有更改任何代码或配置,也没有部署任何内容。现在,在一个容器中,我们无法 ping 另一个服务:
一些信息:我们在 Azure 中运行 Service Fabric 并运行 Windows 集群。我们所有的服务都在 Docker 容器中运行,使用 Docker for Windows。
我们尝试过的事情:重新启动虚拟机,删除并重新部署所有应用程序,重新启动集群中的命名服务和 DNS 服务。
有人见过这样的吗?我正在寻找有关可能出现问题的提示,或有关如何进一步调试此问题的提示。同样,没有部署任何内容,也没有更改任何代码或配置。似乎 Service Fabric 的内部 DNS 突然出现故障,并且不会再次出现。谢谢!
更新:Get-ServiceFabricNodeHealth 在其中一个节点上的输出:
NodeName : _B_41
AggregatedHealthState : Ok
HealthEvents :
SourceId : System.FabricNode
Property : Certificate_client
HealthState : Ok
SequenceNumber : 132078454391466815
SentAt : 7/17/2019 1:57:19 PM
ReceivedAt : 7/17/2019 1:57:24 PM
TTL : Infinite
Description : Certificate expiration: thumbprint = adf7ae93a524d181106b0467a1f8e3375e1bf65f, expiration = 2020-06-20 01:17:33.000, remaining lifetime is
338:11:20:13.853, please refresh ahead of time to avoid catastrophic failure. Warning threshold Security/CertificateExpirySafetyMargin is configured at 30:0:00:00.000, if
needed, you can adjust it to fit your refresh process.
RemoveWhenExpired : False
IsExpired : False
Transitions : Warning->Ok = 7/13/2019 11:22:17 AM, LastError = 1/1/0001 12:00:00 AM
SourceId : System.FabricNode
Property : Certificate_cluster
HealthState : Ok
SequenceNumber : 132078386480915827
SentAt : 7/17/2019 12:04:08 PM
ReceivedAt : 7/17/2019 12:04:23 PM
TTL : Infinite
Description : Certificate expiration: thumbprint = adf7ae93a524d181106b0467a1f8e3375e1bf65f, expiration = 2020-06-20 01:17:33.000, remaining lifetime is
338:13:13:24.908, please refresh ahead of time to avoid catastrophic failure. Warning threshold Security/CertificateExpirySafetyMargin is configured at 30:0:00:00.000, if
needed, you can adjust it to fit your refresh process.
RemoveWhenExpired : False
IsExpired : False
Transitions : Warning->Ok = 7/13/2019 7:04:12 AM, LastError = 1/1/0001 12:00:00 AM
SourceId : System.FabricNode
Property : Certificate_server
HealthState : Ok
SequenceNumber : 132078441374480374
SentAt : 7/17/2019 1:35:37 PM
ReceivedAt : 7/17/2019 1:35:54 PM
TTL : Infinite
Description : Certificate expiration: thumbprint = adf7ae93a524d181106b0467a1f8e3375e1bf65f, expiration = 2020-06-20 01:17:33.000, remaining lifetime is
338:11:41:55.551, please refresh ahead of time to avoid catastrophic failure. Warning threshold Security/CertificateExpirySafetyMargin is configured at 30:0:00:00.000, if
needed, you can adjust it to fit your refresh process.
RemoveWhenExpired : False
IsExpired : False
Transitions : Warning->Ok = 7/13/2019 4:35:41 AM, LastError = 1/1/0001 12:00:00 AM
SourceId : System.RA
Property : RAStoreProvider
HealthState : Ok
SequenceNumber : 132072866375071389
SentAt : 7/11/2019 2:43:57 AM
ReceivedAt : 7/13/2019 1:15:33 PM
TTL : Infinite
Description : Store provider type ESE created and opened successfully.
RemoveWhenExpired : False
IsExpired : False
Transitions : Warning->Ok = 7/11/2019 2:44:27 AM, LastError = 1/1/0001 12:00:00 AM
SourceId : System.FM
Property : State
HealthState : Ok
SequenceNumber : 181
SentAt : 7/11/2019 2:44:15 AM
ReceivedAt : 7/13/2019 1:15:33 PM
TTL : Infinite
Description : Fabric node is up.
RemoveWhenExpired : False
IsExpired : False
Transitions : Warning->Ok = 7/11/2019 2:44:44 AM, LastError = 1/1/0001 12:00:00 AM
更新 2:来自 Docker 容器内的网络接口信息: