0

我按照https://coreos.com/kubernetes/docs/latest/deploy-addons.html上的说明通过创建 kubedns rc 和服务来手动安装 KubeDNS。

yaml如下:

apiVersion: v1 
kind: Service metadata:
   name: kube-dns
   namespace: kube-system
   labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "KubeDNS" spec:   selector:
    k8s-app: kube-dns   clusterIP: ${DNS_SERVICE_IP}   ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP

---

apiVersion: v1
kind: ReplicationController 
metadata:
   name: kube-dns-v11
   namespace: kube-system
   labels:
    k8s-app: kube-dns
    version: v11
    kubernetes.io/cluster-service: "true" spec:   replicas: 1   selector:
    k8s-app: kube-dns
    version: v11   template:
    metadata:
      labels:
        k8s-app: kube-dns
        version: v11
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - name: etcd
        image: gcr.io/google_containers/etcd-amd64:2.2.1
        resources:
          limits:
            cpu: 100m
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 50Mi
        command:
        - /usr/local/bin/etcd
        - -data-dir
        - /var/etcd/data
        - -listen-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -advertise-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -initial-cluster-token
        - skydns-etcd
        volumeMounts:
        - name: etcd-storage
          mountPath: /var/etcd/data
      - name: kube2sky
        image: gcr.io/google_containers/kube2sky:1.14
        resources:
          limits:
            cpu: 100m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 50Mi
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        args:
        # command = "/kube2sky"
        - --domain=cluster.local
      - name: skydns
        image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
        resources:
          limits:
            cpu: 100m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 50Mi
        args:
        # command = "/skydns"
        - -machines=http://127.0.0.1:4001
        - -addr=0.0.0.0:53
        - -ns-rotate=false
        - -domain=cluster.local.
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
      - name: healthz
        image: gcr.io/google_containers/exechealthz:1.0
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
          requests:
            cpu: 10m
            memory: 20Mi
        args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
        - -port=8080
        ports:
        - containerPort: 8080
          protocol: TCP
      volumes:
      - name: etcd-storage
        emptyDir: {}
      dnsPolicy: Default

然而,在创建资源检查状态返回后:

[root@sc-master-1 pods]# kubectl get pods --namespace=kube-system NAME READY     STATUS             RESTARTS   AGE kube-dns-v11-fyug1   2/4   CrashLoopBackOff   2          36s

我已将其缩小为 etcd 错误:

[root@sc-client-2 jonathan]# docker logs bf0466c30b1d 2016-05-15 08:39:26.124819 I | etcdmain: etcd Version: 2.2.1 2016-05-15
08:39:26.124851 I | etcdmain: Git SHA: 75f8282 2016-05-15 
08:39:26.124857 I | etcdmain: Go Version: go1.5.1 2016-05-15 08:39:26.124860 I | etcdmain: Go OS/Arch: linux/amd64 2016-05-15 08:39:26.135982 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1 2016-05-15 08:39:26.136562 I | etcdmain: listening for peers on http://localhost:2380 2016-05-15 08:39:26.136704 I | etcdmain: listening for peers on http://localhost:7001 2016-05-15 
08:39:26.136746 I | etcdmain: listening for client requests on http://127.0.0.1:2379 2016-05-15 08:39:26.136814 I | etcdmain: listening for client requests on http://127.0.0.1:4001 2016-05-15 08:39:26.136931 I | etcdmain: stopping listening for client requests on http://127.0.0.1:4001 2016-05-15 08:39:26.136943 I | etcdmain: stopping listening for client requests on http://127.0.0.1:2379 2016-05-15 
08:39:26.136951 I | etcdmain: stopping listening for peers on http://localhost:7001 2016-05-15 08:39:26.136957 I | etcdmain: stopping listening for peers on http://localhost:2380 2016-05-15 08:39:26.136967 C | etcdmain: mkdir /var/etcd/data/member: permission denied

我不知道为什么这不起作用,并且我无法执行到容器中以手动创建文件夹,因为它不会保持运行。

更新:

用过的:

volumeMounts:
- name: etcd-storage
  mountPath: /var/etcd/data:z

Do not have permission to write to emptyDir绕过 etcd 错误,但我仍然无法启动 dns 服务,下面是 kube-dns pod 中容器的相关日志

[root@sc-master-1 pods]# kubectl get pods --namespace=kube-system
NAME                 READY     STATUS    RESTARTS   AGE
kube-dns-v11-0bkop   3/4       Running   17         20m

skydns 的日志:

[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop skydns --namespace=kube-system
2016/05/18 17:01:11 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [3]
2016/05/18 17:01:11 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2016/05/18 17:01:11 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]

kube2sky 的日志:

[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop kube2sky --namespace=kube-system
I0518 17:02:17.693959       1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0518 17:02:18.697702       1 kube2sky.go:529] Using http://localhost:8080 for kubernetes master
I0518 17:02:18.698071       1 kube2sky.go:530] Using kubernetes API <nil>
I0518 17:02:18.698199       1 kube2sky.go:598] Waiting for service: default/kubernetes
I0518 17:02:18.701663       1 kube2sky.go:604] Ignoring error while waiting for service default/kubernetes: yaml: mapping values are not allowed in this context. Sleeping 1s before retrying.

healthz的日志:

[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop healthz --namespace=kube-system
2016/05/18 17:01:17 Worker running nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/05/18 17:02:12 Client ip 172.17.0.1:35440 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/05/18 17:03:22 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2016-05-18 17:02:22.691812173 +0000 UTC, error exit status 1
2016/05/18 17:03:22 Client ip 172.17.0.1:35475 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null

etcd 的日志:

[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop etcd --namespace=kube-system
2016-05-18 17:01:02.478791 I | etcdmain: etcd Version: 2.2.1
2016-05-18 17:01:02.478825 I | etcdmain: Git SHA: 75f8282
2016-05-18 17:01:02.478831 I | etcdmain: Go Version: go1.5.1
2016-05-18 17:01:02.478846 I | etcdmain: Go OS/Arch: linux/amd64
2016-05-18 17:01:02.478851 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2016-05-18 17:01:02.485798 I | etcdmain: listening for peers on http://localhost:2380
2016-05-18 17:01:02.485931 I | etcdmain: listening for peers on http://localhost:7001
2016-05-18 17:01:02.485984 I | etcdmain: listening for client requests on http://127.0.0.1:2379
2016-05-18 17:01:02.486070 I | etcdmain: listening for client requests on http://127.0.0.1:4001
2016-05-18 17:01:02.486300 I | etcdserver: name = default
2016-05-18 17:01:02.486324 I | etcdserver: data dir = /var/etcd/data
2016-05-18 17:01:02.486329 I | etcdserver: member dir = /var/etcd/data/member
2016-05-18 17:01:02.486333 I | etcdserver: heartbeat = 100ms
2016-05-18 17:01:02.486337 I | etcdserver: election = 1000ms
2016-05-18 17:01:02.486341 I | etcdserver: snapshot count = 10000
2016-05-18 17:01:02.486350 I | etcdserver: advertise client URLs = http://127.0.0.1:2379,http://127.0.0.1:4001
2016-05-18 17:01:02.486356 I | etcdserver: initial advertise peer URLs = http://localhost:2380,http://localhost:7001
2016-05-18 17:01:02.486365 I | etcdserver: initial cluster = default=http://localhost:2380,default=http://localhost:7001
2016-05-18 17:01:02.523097 I | etcdserver: starting member 6a5871dbdd12c17c in cluster f68652439e3f8f2a
2016-05-18 17:01:02.523157 I | raft: 6a5871dbdd12c17c became follower at term 0
2016-05-18 17:01:02.523192 I | raft: newRaft 6a5871dbdd12c17c [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016-05-18 17:01:02.523198 I | raft: 6a5871dbdd12c17c became follower at term 1
2016-05-18 17:01:02.523329 I | etcdserver: starting server... [version: 2.2.1, cluster version: to_be_decided]
2016-05-18 17:01:02.524093 N | etcdserver: added local member 6a5871dbdd12c17c [http://localhost:2380 http://localhost:7001] to cluster f68652439e3f8f2a
2016-05-18 17:01:03.323562 I | raft: 6a5871dbdd12c17c is starting a new election at term 1
2016-05-18 17:01:03.323722 I | raft: 6a5871dbdd12c17c became candidate at term 2
2016-05-18 17:01:03.323739 I | raft: 6a5871dbdd12c17c received vote from 6a5871dbdd12c17c at term 2
2016-05-18 17:01:03.323776 I | raft: 6a5871dbdd12c17c became leader at term 2
2016-05-18 17:01:03.323787 I | raft: raft.node: 6a5871dbdd12c17c elected leader 6a5871dbdd12c17c at term 2
2016-05-18 17:01:03.324154 I | etcdserver: setting up the initial cluster version to 2.2
2016-05-18 17:01:03.324251 I | etcdserver: published {Name:default ClientURLs:[http://127.0.0.1:2379 http://127.0.0.1:4001]} to cluster f68652439e3f8f2a
2016-05-18 17:01:03.473271 N | etcdserver: set the initial cluster version to 2.2

更新:

能够通过在上述 dns-addon.yml 中的 env 变量中对 master 进行硬编码来领先一步

现在我得到:

[root@sc-master-1 pods]# kubectl logs kube-dns-v11-sgb1r -c kube2sky --namespace=kube-system
I0518 18:08:58.837758       1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0518 18:08:59.839548       1 kube2sky.go:529] Using master for kubernetes master
I0518 18:08:59.839565       1 kube2sky.go:530] Using kubernetes API <nil>
I0518 18:08:59.839676       1 kube2sky.go:598] Waiting for service: default/kubernetes

更新 好的,我能够使用 fqdn 即 http:// :8080 而不是 :8080 来完成这项工作。

我可以使用busybox pod并运行:

[root@sc-master-1 jonathan]# kubectl exec busybox -- nslookup kubernetes.default.svc.cluster.local 10.254.0.2
Server:    10.254.0.2
Address 1: 10.254.0.2

Name:      kubernetes.default.svc.cluster.local
Address 1: 10.254.0.1

这可行,但是我注意到一些奇怪的行为,只要我从 pod 运行 dns 就可以运行:

[root@sc-master-1 jonathan]# nslookup kubernetes.default.svc.cluster.local 10.254.0.2
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached

我收到上述错误,但是在 kube dns 已调度到的节点上运行相同的命令它可以工作:

[jonathan@sc-client-2 ~]$ nslookup kubernetes.default.svc.cluster.local 10.254.0.2
Server:     10.254.0.2
Address:    10.254.0.2#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.254.0.1

从集群中的另一个节点进行测试,我得到与主节点相同的错误:

[root@sc-client-1 jonathan]# nslookup kubernetes.default.svc.cluster.local 10.254.0.2
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached

什么可能是错的?

4

0 回答 0