我按照https://coreos.com/kubernetes/docs/latest/deploy-addons.html上的说明通过创建 kubedns rc 和服务来手动安装 KubeDNS。
yaml如下:
apiVersion: v1
kind: Service metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS" spec: selector:
k8s-app: kube-dns clusterIP: ${DNS_SERVICE_IP} ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
---
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v11
namespace: kube-system
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true" spec: replicas: 1 selector:
k8s-app: kube-dns
version: v11 template:
metadata:
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd-amd64:2.2.1
resources:
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.14
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
args:
# command = "/kube2sky"
- --domain=cluster.local
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain=cluster.local.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default
然而,在创建资源检查状态返回后:
[root@sc-master-1 pods]# kubectl get pods --namespace=kube-system NAME READY STATUS RESTARTS AGE kube-dns-v11-fyug1 2/4 CrashLoopBackOff 2 36s
我已将其缩小为 etcd 错误:
[root@sc-client-2 jonathan]# docker logs bf0466c30b1d 2016-05-15 08:39:26.124819 I | etcdmain: etcd Version: 2.2.1 2016-05-15
08:39:26.124851 I | etcdmain: Git SHA: 75f8282 2016-05-15
08:39:26.124857 I | etcdmain: Go Version: go1.5.1 2016-05-15 08:39:26.124860 I | etcdmain: Go OS/Arch: linux/amd64 2016-05-15 08:39:26.135982 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 1 2016-05-15 08:39:26.136562 I | etcdmain: listening for peers on http://localhost:2380 2016-05-15 08:39:26.136704 I | etcdmain: listening for peers on http://localhost:7001 2016-05-15
08:39:26.136746 I | etcdmain: listening for client requests on http://127.0.0.1:2379 2016-05-15 08:39:26.136814 I | etcdmain: listening for client requests on http://127.0.0.1:4001 2016-05-15 08:39:26.136931 I | etcdmain: stopping listening for client requests on http://127.0.0.1:4001 2016-05-15 08:39:26.136943 I | etcdmain: stopping listening for client requests on http://127.0.0.1:2379 2016-05-15
08:39:26.136951 I | etcdmain: stopping listening for peers on http://localhost:7001 2016-05-15 08:39:26.136957 I | etcdmain: stopping listening for peers on http://localhost:2380 2016-05-15 08:39:26.136967 C | etcdmain: mkdir /var/etcd/data/member: permission denied
我不知道为什么这不起作用,并且我无法执行到容器中以手动创建文件夹,因为它不会保持运行。
更新:
用过的:
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data:z
从Do not have permission to write to emptyDir绕过 etcd 错误,但我仍然无法启动 dns 服务,下面是 kube-dns pod 中容器的相关日志
[root@sc-master-1 pods]# kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-dns-v11-0bkop 3/4 Running 17 20m
skydns 的日志:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop skydns --namespace=kube-system
2016/05/18 17:01:11 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [3]
2016/05/18 17:01:11 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2016/05/18 17:01:11 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
kube2sky 的日志:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop kube2sky --namespace=kube-system
I0518 17:02:17.693959 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0518 17:02:18.697702 1 kube2sky.go:529] Using http://localhost:8080 for kubernetes master
I0518 17:02:18.698071 1 kube2sky.go:530] Using kubernetes API <nil>
I0518 17:02:18.698199 1 kube2sky.go:598] Waiting for service: default/kubernetes
I0518 17:02:18.701663 1 kube2sky.go:604] Ignoring error while waiting for service default/kubernetes: yaml: mapping values are not allowed in this context. Sleeping 1s before retrying.
healthz的日志:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop healthz --namespace=kube-system
2016/05/18 17:01:17 Worker running nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/05/18 17:02:12 Client ip 172.17.0.1:35440 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/05/18 17:03:22 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
, at 2016-05-18 17:02:22.691812173 +0000 UTC, error exit status 1
2016/05/18 17:03:22 Client ip 172.17.0.1:35475 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
etcd 的日志:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-0bkop etcd --namespace=kube-system
2016-05-18 17:01:02.478791 I | etcdmain: etcd Version: 2.2.1
2016-05-18 17:01:02.478825 I | etcdmain: Git SHA: 75f8282
2016-05-18 17:01:02.478831 I | etcdmain: Go Version: go1.5.1
2016-05-18 17:01:02.478846 I | etcdmain: Go OS/Arch: linux/amd64
2016-05-18 17:01:02.478851 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2016-05-18 17:01:02.485798 I | etcdmain: listening for peers on http://localhost:2380
2016-05-18 17:01:02.485931 I | etcdmain: listening for peers on http://localhost:7001
2016-05-18 17:01:02.485984 I | etcdmain: listening for client requests on http://127.0.0.1:2379
2016-05-18 17:01:02.486070 I | etcdmain: listening for client requests on http://127.0.0.1:4001
2016-05-18 17:01:02.486300 I | etcdserver: name = default
2016-05-18 17:01:02.486324 I | etcdserver: data dir = /var/etcd/data
2016-05-18 17:01:02.486329 I | etcdserver: member dir = /var/etcd/data/member
2016-05-18 17:01:02.486333 I | etcdserver: heartbeat = 100ms
2016-05-18 17:01:02.486337 I | etcdserver: election = 1000ms
2016-05-18 17:01:02.486341 I | etcdserver: snapshot count = 10000
2016-05-18 17:01:02.486350 I | etcdserver: advertise client URLs = http://127.0.0.1:2379,http://127.0.0.1:4001
2016-05-18 17:01:02.486356 I | etcdserver: initial advertise peer URLs = http://localhost:2380,http://localhost:7001
2016-05-18 17:01:02.486365 I | etcdserver: initial cluster = default=http://localhost:2380,default=http://localhost:7001
2016-05-18 17:01:02.523097 I | etcdserver: starting member 6a5871dbdd12c17c in cluster f68652439e3f8f2a
2016-05-18 17:01:02.523157 I | raft: 6a5871dbdd12c17c became follower at term 0
2016-05-18 17:01:02.523192 I | raft: newRaft 6a5871dbdd12c17c [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016-05-18 17:01:02.523198 I | raft: 6a5871dbdd12c17c became follower at term 1
2016-05-18 17:01:02.523329 I | etcdserver: starting server... [version: 2.2.1, cluster version: to_be_decided]
2016-05-18 17:01:02.524093 N | etcdserver: added local member 6a5871dbdd12c17c [http://localhost:2380 http://localhost:7001] to cluster f68652439e3f8f2a
2016-05-18 17:01:03.323562 I | raft: 6a5871dbdd12c17c is starting a new election at term 1
2016-05-18 17:01:03.323722 I | raft: 6a5871dbdd12c17c became candidate at term 2
2016-05-18 17:01:03.323739 I | raft: 6a5871dbdd12c17c received vote from 6a5871dbdd12c17c at term 2
2016-05-18 17:01:03.323776 I | raft: 6a5871dbdd12c17c became leader at term 2
2016-05-18 17:01:03.323787 I | raft: raft.node: 6a5871dbdd12c17c elected leader 6a5871dbdd12c17c at term 2
2016-05-18 17:01:03.324154 I | etcdserver: setting up the initial cluster version to 2.2
2016-05-18 17:01:03.324251 I | etcdserver: published {Name:default ClientURLs:[http://127.0.0.1:2379 http://127.0.0.1:4001]} to cluster f68652439e3f8f2a
2016-05-18 17:01:03.473271 N | etcdserver: set the initial cluster version to 2.2
更新:
能够通过在上述 dns-addon.yml 中的 env 变量中对 master 进行硬编码来领先一步
现在我得到:
[root@sc-master-1 pods]# kubectl logs kube-dns-v11-sgb1r -c kube2sky --namespace=kube-system
I0518 18:08:58.837758 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0518 18:08:59.839548 1 kube2sky.go:529] Using master for kubernetes master
I0518 18:08:59.839565 1 kube2sky.go:530] Using kubernetes API <nil>
I0518 18:08:59.839676 1 kube2sky.go:598] Waiting for service: default/kubernetes
更新 好的,我能够使用 fqdn 即 http:// :8080 而不是 :8080 来完成这项工作。
我可以使用busybox pod并运行:
[root@sc-master-1 jonathan]# kubectl exec busybox -- nslookup kubernetes.default.svc.cluster.local 10.254.0.2
Server: 10.254.0.2
Address 1: 10.254.0.2
Name: kubernetes.default.svc.cluster.local
Address 1: 10.254.0.1
这可行,但是我注意到一些奇怪的行为,只要我从 pod 运行 dns 就可以运行:
[root@sc-master-1 jonathan]# nslookup kubernetes.default.svc.cluster.local 10.254.0.2
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
我收到上述错误,但是在 kube dns 已调度到的节点上运行相同的命令它可以工作:
[jonathan@sc-client-2 ~]$ nslookup kubernetes.default.svc.cluster.local 10.254.0.2
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: kubernetes.default.svc.cluster.local
Address: 10.254.0.1
从集群中的另一个节点进行测试,我得到与主节点相同的错误:
[root@sc-client-1 jonathan]# nslookup kubernetes.default.svc.cluster.local 10.254.0.2
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
什么可能是错的?