10

我使用 Vagrant 部署了一些 VM 来测试 kubernetes:
master:4 个 CPU,4GB RAM
node-1:4 个 CPU,8GB RAM
基本映像:Centos/7。
网络:桥接。
主机操作系统:Centos 7.2

按照kubeadm 入门指南使用 kubeadm 部署 kubernetes 。将节点添加到集群并安装 Weave Net 后,很遗憾,我无法启动并运行 kube-dns,因为它处于 ContainerCreating 状态:

[vagrant@master ~]$ kubectl get pods --all-namespaces
NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE
kube-system   etcd-master                      1/1       Running             0          1h
kube-system   kube-apiserver-master            1/1       Running             0          1h
kube-system   kube-controller-manager-master   1/1       Running             0          1h
kube-system   kube-discovery-982812725-0tiiy   1/1       Running             0          1h
kube-system   kube-dns-2247936740-46rcz        0/3       ContainerCreating   0          1h
kube-system   kube-proxy-amd64-4d8s7           1/1       Running             0          1h
kube-system   kube-proxy-amd64-sqea1           1/1       Running             0          1h
kube-system   kube-scheduler-master            1/1       Running             0          1h
kube-system   weave-net-h1om2                  2/2       Running             0          1h
kube-system   weave-net-khebq                  1/2       CrashLoopBackOff    17         1h

我认为这个问题在某种程度上与位于节点 1 上的 CrashloopBackoff 状态的 weave-net pod 有关:

[vagrant@master ~]$ kubectl describe pods --namespace=kube-system weave-net-khebq
Name:       weave-net-khebq
Namespace:  kube-system
Node:       node-1/10.0.2.15
Start Time: Wed, 05 Oct 2016 07:10:39 +0000
Labels:     name=weave-net
Status:     Running
IP:     10.0.2.15
Controllers:    DaemonSet/weave-net
Containers:
  weave:
    Container ID:   docker://4976cd0ec6f971397aaf7fbfd746ca559322ab3d8f4ee217dd6c8bd3f6ed4f76
    Image:      weaveworks/weave-kube:1.7.0
    Image ID:       docker://sha256:1ac5304168bd9dd35c0ecaeb85d77d26c13a7d077aa8629b2a1b4e354cdffa1a
    Port:       
    Command:
      /home/weave/launch.sh
    Requests:
      cpu:      10m
    State:      Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 05 Oct 2016 08:18:51 +0000
      Finished:     Wed, 05 Oct 2016 08:18:51 +0000
    Ready:      False
    Restart Count:  18
    Liveness:       http-get http://127.0.0.1:6784/status delay=30s timeout=1s period=10s #success=1 #failure=3
    Volume Mounts:
      /etc from cni-conf (rw)
      /host_home from cni-bin2 (rw)
      /opt from cni-bin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)
      /weavedb from weavedb (rw)
    Environment Variables:
      WEAVE_VERSION:    1.7.0
  weave-npc:
    Container ID:   docker://feef7e7436d2565182d99c9021958619f65aff591c576a0c240ac0adf9c66a0b
    Image:      weaveworks/weave-npc:1.7.0
    Image ID:       docker://sha256:4d7f0bd7c0e63517a675e352146af7687a206153e66bdb3d8c7caeb54802b16a
    Port:       
    Requests:
      cpu:      10m
    State:      Running
      Started:      Wed, 05 Oct 2016 07:11:04 +0000
    Ready:      True
    Restart Count:  0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)
    Environment Variables:  <none>
Conditions:
  Type      Status
  Initialized   True 
  Ready     False 
  PodScheduled  True 
Volumes:
  weavedb:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  cni-bin:
    Type:   HostPath (bare host directory volume)
    Path:   /opt
  cni-bin2:
    Type:   HostPath (bare host directory volume)
    Path:   /home
  cni-conf:
    Type:   HostPath (bare host directory volume)
    Path:   /etc
  default-token-kir36:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-kir36
QoS Class:  Burstable
Tolerations:    dedicated=master:Equal:NoSchedule
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath       Type        Reason      Message
  --------- --------    -----   ----            -------------       --------    ------      -------
  1h        3m      19  {kubelet node-1}    spec.containers{weave}  Normal      Pulling     pulling image "weaveworks/weave-kube:1.7.0"
  1h        3m      19  {kubelet node-1}    spec.containers{weave}  Normal      Pulled      Successfully pulled image "weaveworks/weave-kube:1.7.0"
  55m       3m      11  {kubelet node-1}    spec.containers{weave}  Normal      Created     (events with common reason combined)
  55m       3m      11  {kubelet node-1}    spec.containers{weave}  Normal      Started     (events with common reason combined)
  1h        14s     328 {kubelet node-1}    spec.containers{weave}  Warning     BackOff     Back-off restarting failed docker container
  1h        14s     300 {kubelet node-1}                Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-khebq_kube-system(d1feb9c1-8aca-11e6-8d4f-525400c583ad)"

列出在 node-1 上运行的容器给出

[vagrant@node-1 ~]$ sudo docker ps
CONTAINER ID        IMAGE                                              COMMAND                  CREATED             STATUS              PORTS               NAMES
feef7e7436d2        weaveworks/weave-npc:1.7.0                         "/usr/bin/weave-npc"     About an hour ago   Up About an hour                        k8s_weave-npc.e6299282_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-8d4f-525400c583ad_0f0517cf
762cd80d491e        gcr.io/google_containers/pause-amd64:3.0           "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-8d4f-525400c583ad_cda766ac
8c3395959d0e        gcr.io/google_containers/kube-proxy-amd64:v1.4.0   "/usr/local/bin/kube-"   About an hour ago   Up About an hour                        k8s_kube-proxy.64a0bb96_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-11e6-8d4f-525400c583ad_48e7eb9a
d0fbb716bbf3        gcr.io/google_containers/pause-amd64:3.0           "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-11e6-8d4f-525400c583ad_d6b232ea

第一个容器的日志显示一些连接错误:

[vagrant@node-1 ~]$ sudo docker logs feef7e7436d2
E1005 08:46:06.368703       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: Failed to list *api.Pod: Get https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:06.370119       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: Failed to list *extensions.NetworkPolicy: Get https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:06.473779       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153: Failed to list *api.Namespace: Get https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.370451       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: Failed to list *api.Pod: Get https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.371308       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: Failed to list *extensions.NetworkPolicy: Get https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.474991       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153: Failed to list *api.Namespace: Get https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused

我缺乏 Kubernetes 和容器网络方面的经验来进一步解决这些问题,因此非常感谢一些提示。观察:所有 pod/nodes 都将其 IP 报告为 10.0.2.15,这是本地 Vagrant NAT 地址,而不是 VM 的实际 IP 地址。

4

2 回答 2

12

这是对我有用的食谱(截至 2017 年 3 月 19 日,使用 Vagrant 和 VirtualBox)。集群由 3 个节点组成,1 个主节点和 2 个节点。

1)确保在 init 上明确设置主节点的 IP

kubeadm init --api-advertise-addresses=10.30.3.41

2)手动或在配置期间,将/etc/hosts您配置为具有的确切 IP 添加到每个节点的确切 IP。这是您可以在 Vagrant 文件中添加的一行(我使用的节点命名约定:k8node-$i):

config.vm.provision :shell, inline: "sed 's/127\.0\.0\.1.*k8node.*/10.30.3.4#{i} k8node-#{i}/' -i /etc/hosts"

例子:

vagrant@k8node-1:~$ cat /etc/hosts
10.30.3.41 k8node-1
127.0.0.1   localhost

3)最后,所有节点都会尝试使用集群的公共 IP 连接到主节点(不知道为什么会这样……)。这是解决方法。

首先,通过在 master 上运行以下命令找到公共 IP。

kubectl get svc
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.96.0.1    <none>        443/TCP   1h

在每个节点中,确保任何使用 10.96.0.1(在我的情况下)的进程都被路由到 10.30.3.41 上的主节点。

所以在每个节点上(你可以跳过主节点)使用route来设置重定向。

route add 10.96.0.1 gw 10.30.3.41

之后,一切都应该正常:

vagrant@k8node-1:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY     STATUS    RESTARTS   AGE
kube-system   dummy-2088944543-rnl2f             1/1       Running   0          1h
kube-system   etcd-k8node-1                      1/1       Running   0          1h
kube-system   kube-apiserver-k8node-1            1/1       Running   0          1h
kube-system   kube-controller-manager-k8node-1   1/1       Running   0          1h
kube-system   kube-discovery-1769846148-g8g85    1/1       Running   0          1h
kube-system   kube-dns-2924299975-7wwm6          4/4       Running   0          1h
kube-system   kube-proxy-9dxsb                   1/1       Running   0          46m
kube-system   kube-proxy-nx63x                   1/1       Running   0          1h
kube-system   kube-proxy-q0466                   1/1       Running   0          1h
kube-system   kube-scheduler-k8node-1            1/1       Running   0          1h
kube-system   weave-net-2nc8d                    2/2       Running   0          46m
kube-system   weave-net-2tphv                    2/2       Running   0          1h
kube-system   weave-net-mp6s0                    2/2       Running   0          1h


vagrant@k8node-1:~$ kubectl get nodes
NAME       STATUS         AGE
k8node-1   Ready,master   1h
k8node-2   Ready          1h
k8node-3   Ready          48m
于 2017-03-20T05:50:57.617 回答
0

请查看David Bainbridge 的存储库

于 2016-10-24T14:12:20.077 回答