我当地的 k3s 游乐场决定突然停止工作。我有直觉觉得 https 证书有问题我使用 docker compose 启动集群
version: '3.2'
services:
server:
image: rancher/k3s:latest
command: server --disable-agent --tls-san 192.168.2.110
environment:
- K3S_CLUSTER_SECRET=somethingtotallyrandom
- K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml
- K3S_KUBECONFIG_MODE=666
volumes:
- k3s-server:/var/lib/rancher/k3s
# get the kubeconfig file
- .:/output
- ./registries.yaml:/etc/rancher/k3s/registries.yaml
ports:
- 192.168.2.110:6443:6443
node:
image: rancher/k3s:latest
volumes:
- ./registries.yaml:/etc/rancher/k3s/registries.yaml
tmpfs:
- /run
- /var/run
privileged: true
environment:
- K3S_URL=https://server:6443
- K3S_CLUSTER_SECRET=somethingtotallyrandom
ports:
- 31000-32000:31000-32000
volumes:
k3s-server: {}
没什么特别的。registries.yaml
可以取消注释而不会产生影响。内容是
mirrors:
"192.168.2.110:5055":
endpoint:
- "http://192.168.2.110:5055"
但是我现在遇到了一堆奇怪的失败
server_1 | E0516 22:58:03.264451 1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
server_1 | E0516 22:58:08.265272 1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
node_1 | I0516 22:58:12.695365 1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: bb7ee4b14724692f4497e99716b68c4dc4fe77333b03801909092d42c00ef5a2
node_1 | I0516 22:58:15.006306 1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: bb7ee4b14724692f4497e99716b68c4dc4fe77333b03801909092d42c00ef5a2
node_1 | I0516 22:58:15.006537 1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: fc2e51300f2ec06949abf5242690cb36077adc409f0d7f131a9d4f911063b63c
node_1 | E0516 22:58:15.006757 1 pod_workers.go:191] Error syncing pod e127dc88-e252-4e2e-bbd5-2e93ce5e32ff ("helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"), skipping: failed to "StartContainer" for "helm" with CrashLoopBackOff: "back-off 1m20s restarting failed container=helm pod=helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"
server_1 | E0516 22:58:22.345501 1 resource_quota_controller.go:408] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
node_1 | I0516 22:58:27.695296 1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: fc2e51300f2ec06949abf5242690cb36077adc409f0d7f131a9d4f911063b63c
node_1 | E0516 22:58:27.695989 1 pod_workers.go:191] Error syncing pod e127dc88-e252-4e2e-bbd5-2e93ce5e32ff ("helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"), skipping: failed to "StartContainer" for "helm" with CrashLoopBackOff: "back-off 1m20s restarting failed container=helm pod=helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"
server_1 | I0516 22:58:30.328999 1 request.go:621] Throttling request took 1.047650754s, request: GET:https://127.0.0.1:6444/apis/admissionregistration.k8s.io/v1beta1?timeout=32s
server_1 | W0516 22:58:31.081020 1 garbagecollector.go:644] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
server_1 | E0516 22:58:36.442904 1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
node_1 | I0516 22:58:40.695404 1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: fc2e51300f2ec06949abf5242690cb36077adc409f0d7f131a9d4f911063b63c
node_1 | E0516 22:58:40.696176 1 pod_workers.go:191] Error syncing pod e127dc88-e252-4e2e-bbd5-2e93ce5e32ff ("helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"), skipping: failed to "StartContainer" for "helm" with CrashLoopBackOff: "back-off 1m20s restarting failed container=helm pod=helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"
server_1 | E0516 22:58:41.443295 1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
似乎我的节点不再真正连接到服务器了
user@ipc:~/dev/test_mk3s_docker$ docker exec -it $(docker ps |grep "k3s server"|awk -F\ '{print $1}') kubectl cluster-info
Kubernetes master is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
user@ipc:~/dev/test_mk3s_docker$ docker exec -it $(docker ps |grep "k3s agent"|awk -F\ '{print $1}') kubectl cluster-info
error: Missing or incomplete configuration info. Please point to an existing, complete config file:
1. Via the command-line flag --kubeconfig
2. Via the KUBECONFIG environment variable
3. In your home directory as ~/.kube/config
To view or setup config directly use the 'config' command.
如果我运行 `kubectl get apiservice 我得到以下行
v1beta1.storage.k8s.io Local True 20m
v1beta1.scheduling.k8s.io Local True 20m
v1.storage.k8s.io Local True 20m
v1.k3s.cattle.io Local True 20m
v1.helm.cattle.io Local True 20m
v1beta1.metrics.k8s.io kube-system/metrics-server False (FailedDiscoveryCheck) 20m
还将k3s降级为k3s:v1.0.1
仅更改错误消息
server_1 | E0516 23:46:02.951073 1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.CSINode: no kind "CSINode" is registered for version "storage.k8s.io/v1" in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30"
server_1 | E0516 23:46:03.444519 1 status.go:71] apiserver received an error that is not an metav1.Status: &runtime.notRegisteredErr{schemeName:"k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30", gvk:schema.GroupVersionKind{Group:"storage.k8s.io", Version:"v1", Kind:"CSINode"}, target:runtime.GroupVersioner(nil), t:reflect.Type(nil)}
执行后
docker exec -it $(docker ps |grep "k3s server"|awk -F\ '{print $1}') kubectl --namespace kube-system delete apiservice v1beta1.metrics.k8s.io
我只得到
node_1 | W0517 07:03:06.346944 1 info.go:51] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
node_1 | I0517 07:03:21.504932 1 log.go:172] http: TLS handshake error from 10.42.1.15:53888: remote error: tls: bad certificate