由于我想要的容器的 docker pull 正在达到 i/o 超时,因此我的工作问题最终处于一个永恒的待处理状态。我已经读过好几次关于更改 DNS 以解决此问题的文章,但它似乎有点做作,我不需要专用网络上的 pub google 地址......这nomad job ping-services.nomad
是运行后的结果。
○ → nomad job status ping_service
ID = ping_service
Name = ping_service
Submit Date = 2019-04-25T13:29:04-07:00
Type = service
Priority = 50
Datacenters = public-services,private-services,content-connector,backoffice
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
ping_service_group 0 3 1 0 4 0
Allocations
ID Node ID Task Group Version Desired Status Created Modified
05468ff2 23b79904 ping_service_group 2 run pending 18h28m ago 19s ago <- here
5ce4c9ba 1601d6b1 ping_service_group 2 run pending 18h28m ago 20s ago <- here
9eced817 2260997a ping_service_group 2 run running 18h28m ago 18h28m ago
aefab4c3 032217e1 ping_service_group 2 run pending 18h28m ago 42s ago <- and here
运行后可以看到只有3/4成功nomad alloc status 05468ff2
○ → nomad alloc status 05468ff2
ID = 05468ff2
Eval ID = 10b76231
Name = ping_service.ping_service_group[1]
Node ID = 23b79904
Job ID = ping_service
Job Version = 2
Client Status = pending
Client Description = <none>
Desired Status = run
Desired Description = <none>
Created = 18h35m ago
Modified = 15s ago
Task "ping_service_task" is "pending"
Task Resources
CPU Memory Disk IOPS Addresses
100 MHz 20 MiB 50 MiB 0 http: xx.xxx.xxx.xxx:31215
Task Events:
Started At = N/A
Finished At = N/A
Total Restarts = 982
Last Restart = 2019-04-26T15:04:01Z
Recent Events:
Time Type Description
2019-04-26T08:04:28-07:00 Driver Downloading image thobe/ping_service:0.0.9
2019-04-26T08:04:01-07:00 Restarting Task restarting in 27.061915977s
2019-04-26T08:04:01-07:00 Driver Failure failed to initialize task "ping_service_task" for alloc "05468ff2-f5a0-7a67-3dd7-947d4b30ec45": Failed to pull `thobe/ping_service:0.0.9`: error pulling image configuration: Get https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/cf/cfaa80d7f11f028474f755c007960a0b219c90e1edc45d94039a987c46d7ca32/data?verify=1556294011-ftjrcDBBZK4hiQV99v5QZXxvp34%3D: dial tcp 104.18.122.25:443: i/o timeout
2019-04-26T08:03:19-07:00 Driver Downloading image thobe/ping_service:0.0.9
2019-04-26T08:02:51-07:00 Restarting Task restarting in 27.302069343s
2019-04-26T08:02:51-07:00 Driver Failure failed to initialize task "ping_service_task" for alloc "05468ff2-f5a0-7a67-3dd7-947d4b30ec45": Failed to pull `thobe/ping_service:0.0.9`: error pulling image configuration: Get https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/cf/cfaa80d7f11f028474f755c007960a0b219c90e1edc45d94039a987c46d7ca32/data?verify=1556293941-ZUevnKxoKohkLDGDkv5E4A79aZ8%3D: dial tcp 104.18.122.25:443: i/o timeout
2019-04-26T08:02:12-07:00 Driver Downloading image thobe/ping_service:0.0.9
2019-04-26T08:01:46-07:00 Restarting Task restarting in 25.629825445s
2019-04-26T08:01:46-07:00 Driver Failure failed to initialize task "ping_service_task" for alloc "05468ff2-f5a0-7a67-3dd7-947d4b30ec45": Failed to pull `thobe/ping_service:0.0.9`: error pulling image configuration: Get https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/cf/cfaa80d7f11f028474f755c007960a0b219c90e1edc45d94039a987c46d7ca32/data?verify=1556293876-lE4pvy9Jsruduu76LeMoQxL0gxk%3D: dial tcp 104.18.123.25:443: i/o timeout
2019-04-26T08:01:07-07:00 Driver Downloading image thobe/ping_service:0.0.9
您可以清楚地看到问题在于 I/O 超时阻止我们拉入我们的层,所以,跳到节点上,让我们手动尝试一下……
## Make sure we're really logged into ECR/Docker
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ docker login
Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
## Attempt a manual pull...
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ docker pull thobe/ping_service:0.0.9
0.0.9: Pulling from thobe/ping_service
ff3a5c916c92: Pulling fs layer
3c5613eb8e39: Pulling fs layer
error pulling image configuration: Get https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/cf/cfaa80d7f11f028474f755c007960a0b219c90e1edc45d94039a987c46d7ca32/data?verify=1556293601-mrJGlZisGPDvwapT7cAbax7UWig%3D: dial tcp 104.18.125.25:443: i/o timeout
## Are you there God?
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ ping -c1 production.cloudflare.docker.com
PING production.cloudflare.docker.com (104.18.123.25) 56(84) bytes of data.
--- production.cloudflare.docker.com ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
## NS of Google Pub DNS
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ nslookup production.cloudflare.docker.com 8.8.8.8
;; connection timed out; no servers could be reached
## NS of Primary nameserver
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ nslookup production.cloudflare.docker.com 10.128.8.8
;; connection timed out; no servers could be reached
## NS of Secondary nameserver
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ nslookup production.cloudflare.docker.com 10.128.0.2
Server: 10.128.0.2
Address: 10.128.0.2#53
Non-authoritative answer:
Name: production.cloudflare.docker.com
Address: 104.18.122.25
Name: production.cloudflare.docker.com
Address: 104.18.123.25
Name: production.cloudflare.docker.com
Address: 104.18.124.25
Name: production.cloudflare.docker.com
Address: 104.18.125.25
Name: production.cloudflare.docker.com
Address: 104.18.121.25
## Resolver
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ cat /etc/resolv.conf
options timeout:2 attempts:5
; generated by /usr/sbin/dhclient-script
search nomad-eu-west-1 eu-west-1.compute.internal
nameserver 10.128.8.8
nameserver 10.128.0.2
## What are our current DNS settings?
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ cat /etc/resolv.conf
options timeout:2 attempts:5
; generated by /usr/sbin/dhclient-script
search nomad-eu-west-1 eu-west-1.compute.internal
nameserver 10.128.8.8
nameserver 10.128.0.2
坏节点(也就是那些无法拉取的节点)似乎发生了一些事情。注意似乎Docker Driver
没有被检测到有问题?只需在坏节点上注意到这一点,检查节点事件......
○ → nomad node status 23b79904
ID = 23b79904
Name = i-xxxxxxx
Class = <none>
DC = public-services
Drain = false
Eligibility = eligible
Status = ready
Uptime = 21h43m20s
Driver Status = docker,exec
Node Events
Time Subsystem Message
2019-04-25T20:39:48Z Driver: docker Driver is available and responsive
2019-04-25T20:39:03Z Driver: docker Driver docker is not detected
2019-04-25T18:06:53Z Cluster Node registered
Allocated Resources
CPU Memory Disk IOPS
500/2399 MHz 128 MiB/983 MiB 300 MiB/48 GiB 0/0
Allocation Resource Utilization
CPU Memory
5/2399 MHz 14 MiB/983 MiB
Host Resource Utilization
CPU Memory Disk
24/2399 MHz 410 MiB/984 MiB 1.8 GiB/50 GiB
Allocations
ID Node ID Task Group Version Desired Status Created Modified
05468ff2 23b79904 ping_service_group 2 run pending 19h19m ago 33s ago
9f9ecba6 23b79904 fabio 0 run running 21h33m ago 21h32m ago
下面的好节点....
○ → nomad node status 2260997a
ID = 2260997a
Name = i-xxxxxxxxx
Class = <none>
DC = content-connector
Drain = false
Eligibility = eligible
Status = ready
Uptime = 21h43m28s
Driver Status = docker,exec
Node Events
Time Subsystem Message
2019-04-25T18:07:04Z Cluster Node registered
Allocated Resources
CPU Memory Disk IOPS
100/2400 MHz 20 MiB/983 MiB 50 MiB/48 GiB 0/0
Allocation Resource Utilization
CPU Memory
0/2400 MHz 6.1 MiB/983 MiB
Host Resource Utilization
CPU Memory Disk
23/2400 MHz 361 MiB/984 MiB 1.8 GiB/50 GiB
Allocations
ID Node ID Task Group Version Desired Status Created Modified
9eced817 2260997a ping_service_group 2 run running 19h19m ago 19h19m ago
游牧版本如下
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ nomad -v
Nomad v0.8.6 (ab54ebcfcde062e9482558b7c052702d4cb8aa1b+CHANGES)