0

我有一个单节点集群 (A),其中有几个长期运行的服务。我最近添加了一个新节点 (B),最初是作为工作人员,但后来将其提升为经理。但是,现在 swarm 被破坏了,A 上的 docker 根本不会重新启动。/var/lib/docker我通过删除它恢复了新添加的节点 B。

当我在 B 节点中运行它时,问题就开始了:

# docker service create --name storage-2 --hostname minio-2.media-server --constraint 'node.labels.storage==true' --publish published=9020,target=9000 --replicas 1 minio/minio server /mnt/ssd-2/minio-2.media-server/storage/
owqwx29gtr5v2ozjrarmf3t08
overall progress: 0 out of 1 tasks 
1/1:   
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded
root@media-server:~# docker service ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.

我尝试通过重新应用原始加入令牌将节点 B 重新加入 A 群,但我只得到以下响应:

Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.1.148:2377: connect: connection refused"

现在让我们转到节点 A:这是我daemon.json的节点:

# cat /etc/docker/daemon.json
{
 "runtimes": {
   "nvidia": {
     "path": "/usr/bin/nvidia-container-runtime",
     "runtimeArgs": []
   }
 },
 "default-runtime": "nvidia",
 "userns-remap": "my-user",
 "data-root": "/opt/var/docker/"
}

我做sudo systemctl restart docker的时候就是典型的 Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details

如果我运行,dockerd --debug我会得到以下信息:

time="2020-01-07T16:07:47.940589246-05:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: my-user:my-user"
time="2020-01-07T16:07:47.940649613-05:00" level=debug msg="Creating user namespaced daemon root: /opt/var/docker/1000.999"
time="2020-01-07T16:07:47.940969396-05:00" level=debug msg="Listener created for HTTP on unix (/var/run/docker.sock)"
time="2020-01-07T16:07:47.941193653-05:00" level=info msg="libcontainerd: started new docker-containerd process" pid=3731
time="2020-01-07T16:07:47-05:00" level=info msg="starting containerd" module=containerd revision=773c489c9c1b21a6d78b5c538cd395416ec50f88 version=v1.0.3 
time="2020-01-07T16:07:47-05:00" level=debug msg="changing OOM score to -500" module=containerd 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.content.v1.content"..." module=containerd type=io.containerd.content.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." module=containerd type=io.containerd.snapshotter.v1 
time="2020-01-07T16:07:47-05:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /opt/var/docker/1000.999/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module=containerd 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." module=containerd type=io.containerd.snapshotter.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." module=containerd type=io.containerd.metadata.v1 
time="2020-01-07T16:07:47-05:00" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /opt/var/docker/1000.999/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module="containerd/io.containerd.metadata.v1.bolt" 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." module=containerd type=io.containerd.differ.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." module=containerd type=io.containerd.gc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." module=containerd type=io.containerd.monitor.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." module=containerd type=io.containerd.runtime.v1 
time="2020-01-07T16:07:47-05:00" level=debug msg="loading tasks in namespace" module="containerd/io.containerd.runtime.v1.linux" namespace=plugins.moby 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd-debug.sock" module="containerd/debug" 
time="2020-01-07T16:07:47-05:00" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd.sock" module="containerd/grpc" 
time="2020-01-07T16:07:47-05:00" level=info msg="containerd successfully booted in 0.002536s" module=containerd 
time="2020-01-07T16:07:47.951820862-05:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: my-user:my-user"
time="2020-01-07T16:07:47.951973737-05:00" level=debug msg="Golang's threads limit set to 230760"
time="2020-01-07T16:07:47.952395645-05:00" level=debug msg="processing event stream" module=libcontainerd namespace=plugins.moby
time="2020-01-07T16:07:47.961837504-05:00" level=debug msg="restored container" alive=true container=36d74b0ce14aef040e5994573e42909679ce20e6d8c7e21c9a5ccfa82b9ef38c module=libcontainerd namespace=plugins.moby pid=1966
time="2020-01-07T16:07:47.993831037-05:00" level=error msg="failed to enable plugin 'vieux/sshfs:latest': error creating containerd container: id already in use"
time="2020-01-07T16:07:47.993929990-05:00" level=debug msg="Using default logging driver json-file"
time="2020-01-07T16:07:47.994191355-05:00" level=debug msg="[graphdriver] priority list: [btrfs zfs overlay2 aufs overlay devicemapper vfs]"
time="2020-01-07T16:07:48.003990866-05:00" level=debug msg="backingFs=extfs,  projectQuotaSupported=false"
time="2020-01-07T16:07:48.004000013-05:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
time="2020-01-07T16:07:48.004006714-05:00" level=debug msg="Initialized graph driver overlay2"
time="2020-01-07T16:07:48.015593752-05:00" level=debug msg="Failed to load mount 074d0e7ec2bbf9e234b77b9f94e5bdcb2d506be83937f4704d5546606e886f00: invalid mount id value"
time="2020-01-07T16:07:48.015693673-05:00" level=debug msg="Failed to load mount 0f4801970ae34e1b3573f3361fdb30e1ae1d0ff9909c4d938a8ac366a8f13162: invalid mount id value"
time="2020-01-07T16:07:48.015838872-05:00" level=debug msg="Failed to load mount 17d3e93832d5f14b6a64dafff6ace99aa823f12c5d014697cd96257269caf49f: invalid mount id value"
time="2020-01-07T16:07:48.015852094-05:00" level=debug msg="Failed to load mount 19a2396129ae436e154ab21f8a55933918e742381db7cf7cfd8263398afdeb4c: invalid mount id value"
time="2020-01-07T16:07:48.015889445-05:00" level=debug msg="Failed to load mount 1aca7d8b62b97b368c34bf01a428c6b45d37f28d8592e5e106f613c805ef4801: invalid mount id value"
time="2020-01-07T16:07:48.015901494-05:00" level=debug msg="Failed to load mount 1cf5677cd23bdfd158b2890f5e4e3b0a33dfe0215676ffc3fb36b261223567af: invalid mount id value"
time="2020-01-07T16:07:48.015966443-05:00" level=debug msg="Failed to load mount 2b7f467b2d9afad7399456a8bb3a5dd74ee1ed6dee7bc4bb7feaeda313d0fce5: invalid mount id value"
time="2020-01-07T16:07:48.016134312-05:00" level=debug msg="Failed to load mount 3b2a3257406cb76615cc0f33698a3a5d2332e91358b526e7858883ebb00054f2: invalid mount id value"
time="2020-01-07T16:07:48.016146780-05:00" level=debug msg="Failed to load mount 449ef233ea403394a1e9ea8d4deb1a74a966dd44256f1dd6833d14853fc3d2a9: invalid mount id value"
time="2020-01-07T16:07:48.016158344-05:00" level=debug msg="Failed to load mount 45369784a3382c4a520f478b8a71a148b79c7d01ad56d1e6af58ed4b17517dcf: invalid mount id value"
time="2020-01-07T16:07:48.016287912-05:00" level=debug msg="Failed to load mount 5b14ff5fe4cdaaa8f8b491d227bf7b4354b02a9b13bba05d9da880b4bc617b96: invalid checksum digest format"
time="2020-01-07T16:07:48.016327990-05:00" level=debug msg="Failed to load mount 6719acf4f4ec553baa73b30a33d1bab4fa7054d6d33d8f28b74e9c466fcfea81: invalid mount id value"
time="2020-01-07T16:07:48.016344990-05:00" level=debug msg="Failed to load mount 6e14a906fca9c1448199bb3626c66057609d7129d6fa5b365dcf759ecf4e4b0d: invalid mount id value"
time="2020-01-07T16:07:48.016538443-05:00" level=debug msg="Failed to load mount 824070e6c6437dcd6cb7dba111e6b294f8cbb477386165cf6f2aae18ec716cb5: invalid mount id value"
time="2020-01-07T16:07:48.016628599-05:00" level=debug msg="Failed to load mount 8acc1c7d16433ff26ae83ffa5a390ef76d731136501085c6ce66fa1ebf10c595: invalid mount id value"
time="2020-01-07T16:07:48.016868507-05:00" level=debug msg="Failed to load mount 9d771c8e19874f77d617335c7418baa5912a090e1ed5f208ba50faaea1f4e6bd: invalid mount id value"
time="2020-01-07T16:07:48.016882377-05:00" level=debug msg="Failed to load mount 9de56a13c2b9d5e2e0c167bba5e95a69b1181f29c00e2a0502867afa0d89a7bf: invalid mount id value"
time="2020-01-07T16:07:48.017022459-05:00" level=debug msg="Failed to load mount a504507cb86ddd1ea10ff86bd61a3b203fb19a0d1e66a1ea443e41b125a37850: invalid mount id value"
time="2020-01-07T16:07:48.017459238-05:00" level=debug msg="Failed to load mount d0499c8ceb86c9a06e6dcacadbe94bad3a9bd3d31a2ffe1f973e14df6eb3b970: invalid mount id value"
time="2020-01-07T16:07:48.018010790-05:00" level=debug msg="Max Concurrent Downloads: 3"
time="2020-01-07T16:07:48.018016142-05:00" level=debug msg="Max Concurrent Uploads: 5"
time="2020-01-07T16:07:48-05:00" level=debug msg="garbage collected" d=1.399144ms module="containerd/io.containerd.gc.v1.scheduler" 
time="2020-01-07T16:07:48.083758903-05:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2020-01-07T16:07:48.083902282-05:00" level=warning msg="Your kernel does not support swap memory limit"
time="2020-01-07T16:07:48.083925855-05:00" level=warning msg="Your kernel does not support cgroup rt period"
time="2020-01-07T16:07:48.083931706-05:00" level=warning msg="Your kernel does not support cgroup rt runtime"
time="2020-01-07T16:07:48.084397055-05:00" level=info msg="Loading containers: start."
time="2020-01-07T16:07:48.084582669-05:00" level=debug msg="processing event stream" module=libcontainerd namespace=moby

... hundreds of similar lines after...

time="2020-01-07T16:07:48.110453451-05:00" level=debug msg="Loaded container ffc8dc10f49a4555b9da0479b071e1da55882d73ebb78dea35f601edf8d4db2a, isRunning: false"
time="2020-01-07T16:07:49.562933661-05:00" level=debug msg="restoring container" container=8c74783e68992ed9f6ca0ecb37eec49bcc17c0069c5e7fc73cf10dd78ce57e87 paused=false running=false
time="2020-01-07T16:07:49.562959806-05:00" level=debug msg="restoring container" container=a63d3e14ecaed0de71e3ca874d3946ea5379bf000ac5c97abb0e99be1d057359 paused=false running=false
time="2020-01-07T16:07:49.562967753-05:00" level=debug msg="restoring container" container=a078311c21fbe7fca358acaade5346bf32c642519bbd4b457f5c21d05d87c508 paused=false running=false
time="2020-01-07T16:07:49.563120965-05:00" level=debug msg="restoring container" container=33f9cc2d2b5feabbfdf10888f30d9a63e72fe9d04cc0895635bef974c8f73206 paused=false running=false
time="2020-01-07T16:07:49.581404277-05:00" level=debug msg="restoring container" container=07b7f22f20eeecb9b3ed548aaa931ffaa8ca4a37094b34061d9ac41a35574f74 paused=false running=false



 ... hundreds of similar lines after...

time="2020-01-07T16:07:49.733381464-05:00" level=debug msg="restoring container" container=4f4f97ff30217dc80916eebc9c100c6d799a6a035a73c5c01990062368253ebf paused=false running=false
time="2020-01-07T16:07:49.741439388-05:00" level=debug msg="restoring container" container=39c168eb717931505c00852b38233f4f6bfc3c2b61549e1cabc212a67a729686 paused=false running=false
time="2020-01-07T16:07:49.741561635-05:00" level=debug msg="restoring container" container=b8041f2cf71dd5939ebf91a9a23ba02b4d65915e7e2160bc32391820e320602d paused=false running=false
time="2020-01-07T16:07:49.742282264-05:00" level=debug msg="Option Experimental: false"
time="2020-01-07T16:07:49.742323616-05:00" level=debug msg="Option DefaultDriver: bridge"
time="2020-01-07T16:07:49.742342572-05:00" level=debug msg="Option DefaultNetwork: bridge"
time="2020-01-07T16:07:49.742360955-05:00" level=debug msg="Network Control Plane MTU: 1500"
time="2020-01-07T16:07:49.760484779-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-ISOLATION]"
time="2020-01-07T16:07:49.762934318-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]"
time="2020-01-07T16:07:49.765664342-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]"
time="2020-01-07T16:07:49.768232163-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER]"
time="2020-01-07T16:07:49.770920179-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D PREROUTING]"
time="2020-01-07T16:07:49.773264508-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT]"

... many many lines of ip routing after...

time="2020-01-07T16:07:50.011816215-05:00" level=debug msg="releasing IPv4 pools from network bridge (fbb914a007fc5a6fb39c1478503309bb03dea56479854613649657c528a44720)"
time="2020-01-07T16:07:50.011859810-05:00" level=debug msg="ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.1)"
time="2020-01-07T16:07:50.011894185-05:00" level=debug msg="Received set for ordinal 1, start 0, end 0, any false, release true, serial:false curr:0 \n"
time="2020-01-07T16:07:50.011917547-05:00" level=debug msg="ReleasePool(LocalDefault/172.17.0.0/16)"
time="2020-01-07T16:07:50.013635623-05:00" level=debug msg="cleanupServiceDiscovery for network:fbb914a007fc5a6fb39c1478503309bb03dea56479854613649657c528a44720"
time="2020-01-07T16:07:50.013670741-05:00" level=debug msg="cleanupServiceBindings for fbb914a007fc5a6fb39c1478503309bb03dea56479854613649657c528a44720"
time="2020-01-07T16:07:50.016796612-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2020-01-07T16:07:50.016872068-05:00" level=debug msg="Allocating IPv4 pools for network bridge (04075f9c3581691ac86f78e25f4432eaff14e91ecd69eb1ce93240886b78e2fe)"
time="2020-01-07T16:07:50.016902902-05:00" level=debug msg="RequestPool(LocalDefault, 172.17.0.0/16, , map[], false)"
time="2020-01-07T16:07:50.016940532-05:00" level=debug msg="Received set for ordinal 0, start 0, end 0, any false, release false, serial:false curr:0 \n"
time="2020-01-07T16:07:50.016966550-05:00" level=debug msg="Received set for ordinal 65535, start 0, end 0, any false, release false, serial:false curr:0 \n"
time="2020-01-07T16:07:50.017002299-05:00" level=debug msg="RequestAddress(LocalDefault/172.17.0.0/16, 172.17.0.1, map[RequestAddressType:com.docker.network.gateway])"
time="2020-01-07T16:07:50.017035822-05:00" level=debug msg="Received set for ordinal 1, start 0, end 0, any false, release false, serial:false curr:0 \n"
time="2020-01-07T16:07:50.017381142-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE]"
time="2020-01-07T16:07:50.019760527-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -I POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE]"
time="2020-01-07T16:07:50.022422442-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C DOCKER -i docker0 -j RETURN]"
time="2020-01-07T16:07:50.024866360-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -I DOCKER -i docker0 -j RETURN]"
time="2020-01-07T16:07:50.027363703-05:00" level=debug msg="/sbin/iptables, [--wait -D FORWARD -i docker0 -o docker0 -j DROP]"
time="2020-01-07T16:07:50.029951287-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -i docker0 -o docker0 -j ACCEPT]"
time="2020-01-07T16:07:50.032367948-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -i docker0 -o docker0 -j ACCEPT]"
time="2020-01-07T16:07:50.034909387-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -i docker0 ! -o docker0 -j ACCEPT]"
time="2020-01-07T16:07:50.037318217-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -i docker0 ! -o docker0 -j ACCEPT]"
time="2020-01-07T16:07:50.039875811-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]"
time="2020-01-07T16:07:50.042530975-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]"
time="2020-01-07T16:07:50.045230805-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER ! --dst 127.0.0.0/8]"
time="2020-01-07T16:07:50.047785924-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER ! --dst 127.0.0.0/8]"
time="2020-01-07T16:07:50.050394918-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -o docker0 -j DOCKER]"
time="2020-01-07T16:07:50.052717342-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -o docker0 -j DOCKER]"
time="2020-01-07T16:07:50.062268276-05:00" level=debug msg="/sbin/iptables, [--wait -D FORWARD -j DOCKER-ISOLATION-STAGE-1]"
time="2020-01-07T16:07:50.064657407-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -j DOCKER-ISOLATION-STAGE-1]"
time="2020-01-07T16:07:50.066993453-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2]"
time="2020-01-07T16:07:50.069279455-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2]"
time="2020-01-07T16:07:50.071533312-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP]"
time="2020-01-07T16:07:50.073783603-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP]"
time="2020-01-07T16:07:50.081163012-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -n -L DOCKER-USER]"
time="2020-01-07T16:07:50.083363010-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C DOCKER-USER -j RETURN]"
time="2020-01-07T16:07:50.085669384-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-USER]"
time="2020-01-07T16:07:50.087828138-05:00" level=debug msg="/sbin/iptables, [--wait -D FORWARD -j DOCKER-USER]"
time="2020-01-07T16:07:50.090030604-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -j DOCKER-USER]"
time="2020-01-07T16:07:50.809775582-05:00" level=info msg="Loading containers: done."
time="2020-01-07T16:07:50.955485174-05:00" level=info msg="Docker daemon" commit=9ee9f40 graphdriver(s)=overlay2 version=18.03.1-ce
time="2020-01-07T16:07:50.967234275-05:00" level=debug msg="loaded node credentials" module=node/tls node.id=k55prdk5rxorfnwu5j6xpjpp1 node.role=swarm-manager
time="2020-01-07T16:07:50.974618962-05:00" level=debug msg="(*Agent).run" module=node/agent node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:50.974758665-05:00" level=debug msg="next certificate renewal scheduled for 691h39m9.025246469s from now" module=node/tls node.id=k55prdk5rxorfnwu5j6xpjpp1 node.role=swarm-manager time="2020-02-05 11:47:00.000000137 -0500 EST m=+2489952.067407376"
time="2020-01-07T16:07:50.976189332-05:00" level=info msg="Listening for connections" addr="[::]:2377" module=node node.id=k55prdk5rxorfnwu5j6xpjpp1 proto=tcp
time="2020-01-07T16:07:50.976195878-05:00" level=info msg="Listening for local connections" addr=/var/run/docker/swarm/control.sock module=node node.id=k55prdk5rxorfnwu5j6xpjpp1 proto=unix
2020-01-07 16:07:50.976335 W | wal: ignored file 0000000000000014-00000000000cdd34.wal.broken in wal
time="2020-01-07T16:07:51.046891166-05:00" level=info msg="20ef9e0a7b6041dc became follower at term 117" module=raft node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.046928313-05:00" level=info msg="newRaft 20ef9e0a7b6041dc [peers: [20ef9e0a7b6041dc], term: 117, commit: 2194765, applied: 2193235, lastindex: 2194765, lastterm: 117]" module=raft node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.050413757-05:00" level=info msg="manager selected by agent for new session: {k55prdk5rxorfnwu5j6xpjpp1 192.168.1.148:2377}" module=node/agent node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.050458302-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.050469991-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1



.... many hundreds of similar debug lines after..


time="2020-01-07T16:07:51.067790438-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.067801184-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.067809512-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.089348209-05:00" level=debug msg="(*Agent).UpdateTaskStatus" module=node/agent node.id=k55prdk5rxorfnwu5j6xpjpp1 task.id=0o21qh02tutcemqcxp9a1pnfp
time="2020-01-07T16:07:51.169918243-05:00" level=debug msg="transport: add peer 5db22c04be0bcbfc with address 192.168.1.113:2377"
time="2020-01-07T16:07:51.174166324-05:00" level=debug msg="stop transport"
panic: Copy for *types.BoolValue not implemented

goroutine 499 [running]:
github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy.Copy(0x55a4645b9e20, 0xc422b78ff8, 0x55a4645b9e20, 0xc422b78fd6)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy/copy.go:50 +0x3cb
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*ContainerSpec).CopyFrom(0xc421b2c2c0, 0x55a46457e100, 0xc421b2c160)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/specs.pb.go:1134 +0x957
github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy.Copy(0x55a46457e100, 0xc421b2c2c0, 0x55a46457e100, 0xc421b2c160)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy/copy.go:48 +0x251
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*TaskSpec).CopyFrom(0xc420956058, 0x55a464607460, 0xc420959d18)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/specs.pb.go:1026 +0x277
github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy.Copy(0x55a464607460, 0xc420956058, 0x55a464607460, 0xc420959d18)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy/copy.go:48 +0x251
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*ServiceSpec).CopyFrom(0xc420956028, 0x55a4645c5940, 0xc420959ce8)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/specs.pb.go:905 +0xd4
github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy.Copy(0x55a4645c5940, 0xc420956028, 0x55a4645c5940, 0xc420959ce8)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy/copy.go:48 +0x251
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*Service).CopyFrom(0xc420956000, 0x55a464619560, 0xc420959cc0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/objects.pb.go:435 +0xcd
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*Service).Copy(0xc420959cc0, 0x19)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/objects.pb.go:426 +0x62
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*Service).CopyStoreObject(0xc420959cc0, 0x55a463bad7c9, 0x7)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/objects.pb.go:2323 +0x2d
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.(*tx).create(0xc421df22d0, 0x55a463bad7c9, 0x7, 0x55a465165ac0, 0xc420959cc0, 0xc422b78fa7, 0x9)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/memory.go:557 +0xca
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.CreateService(0x55a465164020, 0xc421df22d0, 0xc420959cc0, 0x28, 0xc421df22d0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/services.go:93 +0xfb
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.init.8.func3(0x55a465164020, 0xc421df22d0, 0xc400000001, 0x55a4651583a0, 0xc422596b70, 0x55a46513c2a0, 0xc420400320)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/services.go:73 +0x12a
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.applyStoreAction(0x55a465164020, 0xc421df22d0, 0xc400000001, 0x55a4651583a0, 0xc422596b70, 0xc4221a3350, 0x150)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/memory.go:310 +0xa9
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.(*MemoryStore).ApplyStoreActions(0xc4202b10c0, 0xc421b2e9c0, 0x1, 0x1, 0xc421b2e900, 0x0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/memory.go:289 +0x165
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft.(*Node).processEntry(0xc420424a00, 0x7f371c8c35c0, 0xc420ff87c0, 0x75, 0x217d4d, 0x0, 0xc421646160, 0x150, 0x150, 0x0, ...)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft/raft.go:1906 +0x131
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft.(*Node).processCommitted(0xc420424a00, 0x7f371c8c35c0, 0xc420ff87c0, 0x75, 0x217d4d, 0x0, 0xc421646160, 0x150, 0x150, 0x0, ...)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft/raft.go:1867 +0x139
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft.(*Node).Run(0xc420424a00, 0x7f371c8c35c0, 0xc420ff87c0, 0x0, 0x0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft/raft.go:655 +0xfbd
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager.(*Manager).Run.func6(0xc4202a3d40, 0x7f371c8c35c0, 0xc4202b1900)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/manager.go:546 +0x4c
created by github.com/docker/docker/vendor/github.com/docker/swarmkit/manager.(*Manager).Run
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/manager.go:545 +0x167e

是否有可能恢复节点 A swarm?我不确定如何进行

# docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
4

1 回答 1

2

恐怕你现在的情况很危急。

要了解它发生的原因,您需要了解 Docker Swarm 的工作原理以及 Raft consesus 的工作原理。

Swarm 使用Raft来管理集群的所有管理器。这是它的含义的真正快速
摘要: 每次添加/删除经理时,所有经理之间都会进行一次选举以选举领导者。获得多数票的节点首先成为领导者。

现在,在您的情况下,您有一位经理,他是领导者。您添加了另一个,因此发生了新的选举,您的一个节点成为了新的领导者。所以你只有两个经理,这是一个非常糟糕的配置。
您从集群中丢失了节点 B(可能是在您删除时/var/lib/docker),并且由于您刚刚丢失了一个管理器,因此正在进行新的选举。所以节点 A “候选人”自己成为领导者,但是,集群认为有两个管理器,因此节点 A 正在等待另一个节点的投票。你可以猜到,答案永远不会出现,因为 B 宕机了,只要没有领导者,集群也会宕机。

这就是(部分)为什么你需要有偶数个经理以及为什么有两个是个坏主意。事实上,即使是 docker 也会间接地 警告你:

您可以在管理器上使用 --force 选项将其从集群中删除。但是,这不会重新配置 swarm 以确保有足够的管理器来维持 swarm 中的法定人数。

这里使用--force或丢失管理器节点会导致相同的情况。

所以我真的不知道你的问题是否有解决方案,但我觉得没有一个(我有同样的问题,不得不重新安装守护程序)。

希望这对您将来有所帮助!

于 2020-01-08T10:46:40.397 回答