0

我是 Kubernetes 的新手,我正在尝试创建一个集群。但是在我使用 kubeadm 命令配置 master 之后,我发现 pod 出现了一些错误,这导致 master 始终处于 NotReady 状态。

一切似乎都源于 kube-proxy 无法列出端点和服务的事实......因此(或者我理解)无法更新 iptables。

这是我的 kubectl 版本:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

以下是来自 kube-proxy pod 的日志:

$ kubectl logs -n kube-system kube-proxy-xjxck
W0430 12:33:28.887260       1 server_others.go:267] Flag proxy-mode="" unknown, assuming iptables proxy
W0430 12:33:28.913671       1 node.go:113] Failed to retrieve node info: Unauthorized
I0430 12:33:28.915780       1 server_others.go:147] Using iptables Proxier.
W0430 12:33:28.916065       1 proxier.go:314] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0430 12:33:28.916089       1 proxier.go:319] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0430 12:33:28.917555       1 server.go:555] Version: v1.14.1
I0430 12:33:28.959345       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0430 12:33:28.960392       1 config.go:202] Starting service config controller
I0430 12:33:28.960444       1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I0430 12:33:28.960572       1 config.go:102] Starting endpoints config controller
I0430 12:33:28.960609       1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
E0430 12:33:28.970720       1 event.go:191] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"fh-ubuntu01.159a40901fa85264", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"fh-ubuntu01", UID:"fh-ubuntu01", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kube-proxy.", Source:v1.EventSource{Component:"kube-proxy", Host:"fh-ubuntu01"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbf2a2e0639406264, ext:334442672, loc:(*time.Location)(0x2703080)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbf2a2e0639406264, ext:334442672, loc:(*time.Location)(0x2703080)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Unauthorized' (will not retry!)
E0430 12:33:28.970939       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
E0430 12:33:28.971106       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Unauthorized
E0430 12:33:29.977038       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
E0430 12:33:29.979890       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Unauthorized
E0430 12:33:30.980098       1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized

现在,我以这种方式创建了一个新的 ClusterRoleBinding:

$ kubectl create clusterrolebinding kube-proxy-binding --clusterrole=system:node-proxier --user=system:kube-proxy

如果我描述 ClusterRole,我可以看到:

$ kubectl describe clusterrole system:node-proxier
Name:         system:node-proxier
Labels:       kubernetes.io/bootstrapping=rbac-defaults
Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
  Resources  Non-Resource URLs  Resource Names  Verbs
  ---------  -----------------  --------------  -----
  events     []                 []              [create patch update]
  nodes      []                 []              [get]
  endpoints  []                 []              [list watch]
  services   []                 []              [list watch]

所以用户“system:kube-proxy”应该能够列出端点和服务,对吧?现在,如果我打印 kube-proxy daemonSet 的 YAML 文件,我会得到他的:

$ kubectl get configmap kube-proxy -n kube-system -o yaml
apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    clientConnection:
      acceptContentTypes: ""
      burst: 10
      contentType: application/vnd.kubernetes.protobuf
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 5
    clusterCIDR: ""
    configSyncPeriod: 15m0s
    conntrack:
      max: null
      maxPerCore: 32768
      min: 131072
      tcpCloseWaitTimeout: 1h0m0s
      tcpEstablishedTimeout: 24h0m0s
    enableProfiling: false
    healthzBindAddress: 0.0.0.0:10256
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: 14
      minSyncPeriod: 0s
      syncPeriod: 30s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      syncPeriod: 30s
    kind: KubeProxyConfiguration
    metricsBindAddress: 127.0.0.1:10249
    mode: ""
    nodePortAddresses: null
    oomScoreAdj: -999
    portRange: ""
    resourceContainer: /kube-proxy
    udpIdleTimeout: 250ms
    winkernel:
      enableDSR: false
      networkName: ""
      sourceVip: ""
  kubeconfig.conf: |-
    apiVersion: v1
    kind: Config
    clusters:
    - cluster:
        certificate-authority: 
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server: https://10.0.1.1:6443
      name: default
    contexts:
    - context:
        cluster: default
        namespace: default
        user: default
      name: default
    current-context: default
    users:
    - name: default
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
  creationTimestamp: "2019-03-21T10:34:03Z"
  labels:
    app: kube-proxy
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "4458115"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
  uid: d8a454fb-4bc4-11e9-b0b4-00155d044109

我可以看到让我感到困惑的“用户:默认”......它试图与哪个用户进行身份验证?是否有一个名为“default”的实际用户?

非常感谢!

kubectl 的输出 get po -n kube-system

$ kubectl get po - n kube-system
NAME                                  READY   STATUS             RESTARTS   AGE
coredns-fb8b8dccf-27qck               0/1     Pending            0          7d15h
coredns-fb8b8dccf-dd6bh               0/1     Pending            0          7d15h
kube-apiserver-fh-ubuntu01            1/1     Running            1          7d15h
kube-controller-manager-fh-ubuntu01   1/1     Running            0          7d15h
kube-proxy-xjxck                      1/1     Running            0          43h
kube-scheduler-fh-ubuntu01            1/1     Running            1          7d15h
weave-net-psqh5                       1/2     CrashLoopBackOff   2144       7d15h

集群运行状况看起来很健康

$ kubectl get cs 
NAME STATUS MESSAGE ERROR 
controller-manager Healthy ok 
scheduler Healthy ok 
etcd-2 Healthy {"health": "true"} 
etcd-3 Healthy {"health": "true"} 
etcd-0 Healthy {"health": "true"} 
etcd-1 Healthy {"health": "true"}
4

1 回答 1

1

运行以下命令以检查集群运行状况

kubectl get cs

然后检查控制平面服务的状态

kubectl get po -n kube-system 

问题似乎与 weave-net-psqh5 pod 有关。找出它为什么进入 CrashLoop 状态。

分享来自 weave-net-psqh5 的日志。

于 2019-05-01T09:58:09.620 回答