我正在尝试让 Heapster 在我的 Kubernetes 集群上工作。我正在使用 Kube-DNS 进行 DNS 解析。
我的 Kube-DNS 似乎设置正确:
kubectl 描述 pod kube-dns-v20-z2dd2 -n kube-system
Name: kube-dns-v20-z2dd2
Namespace: kube-system
Node: 172.31.48.201/172.31.48.201
Start Time: Mon, 22 Jan 2018 09:21:49 +0000
Labels: k8s-app=kube-dns
version=v20
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 172.17.29.4
Controlled By: ReplicationController/kube-dns-v20
Containers:
kubedns:
Container ID: docker://13f95bdf8dee273ca18a2eee1b99fe00e5fff41279776cdef5d7e567472a39dc
Image: gcr.io/google_containers/kubedns-amd64:1.8
Image ID: docker-pullable://gcr.io/google_containers/kubedns-amd64@sha256:39264fd3c998798acdf4fe91c556a6b44f281b6c5797f464f92c3b561c8c808c
Ports: 10053/UDP, 10053/TCP
Args:
--domain=cluster.local.
--dns-port=10053
State: Running
Started: Mon, 22 Jan 2018 09:22:05 +0000
Ready: True
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
dnsmasq:
Container ID: docker://576ebc30e8f7aae13000a2d06541c165a3302376ad04c604b12803463380d9b5
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4
Image ID: docker-pullable://gcr.io/google_containers/kube-dnsmasq-amd64@sha256:a722df15c0cf87779aad8ba2468cf072dd208cb5d7cfcaedd90e66b3da9ea9d2
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
State: Running
Started: Mon, 22 Jan 2018 09:22:20 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
healthz:
Container ID: docker://3367d05fb0e13c892243a4c86c74a170b0a9a2042387a70f6690ed946afda4d2
Image: gcr.io/google_containers/exechealthz-amd64:1.2
Image ID: docker-pullable://gcr.io/google_containers/exechealthz-amd64@sha256:503e158c3f65ed7399f54010571c7c977ade7fe59010695f48d9650d83488c0a
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
State: Running
Started: Mon, 22 Jan 2018 09:22:32 +0000
Ready: True
Restart Count: 0
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-9zxzd:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9zxzd
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned kube-dns-v20-z2dd2 to 172.31.48.201
Normal SuccessfulMountVolume 43m kubelet, 172.31.48.201 MountVolume.SetUp succeeded for volume "default-token-9zxzd"
Normal Pulling 43m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/kubedns-amd64:1.8"
Normal Pulled 43m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/kubedns-amd64:1.8"
Normal Created 43m kubelet, 172.31.48.201 Created container
Normal Started 43m kubelet, 172.31.48.201 Started container
Normal Pulling 43m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4"
Normal Pulled 42m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4"
Normal Created 42m kubelet, 172.31.48.201 Created container
Normal Started 42m kubelet, 172.31.48.201 Started container
Normal Pulling 42m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/exechealthz-amd64:1.2"
Normal Pulled 42m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/exechealthz-amd64:1.2"
Normal Created 42m kubelet, 172.31.48.201 Created container
Normal Started 42m kubelet, 172.31.48.201 Started container
kubectl 描述 svc kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: <none>
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.254.0.2
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 172.17.29.4:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 172.17.29.4:53
Session Affinity: None
Events: <none>
kubectl 描述 ep kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: <none>
Subsets:
Addresses: 172.17.29.4
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
dns 53 UDP
dns-tcp 53 TCP
Events: <none>
kubectl exec -it busybox1 -- nslookup kubernetes.default
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local
但是,如果我试图在busybox容器(在kube-system命名空间之外)上解析http://monitoring-influxdb,则无法解析:
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system --nslookup http://monitoring-influxdb
Server: (null)
Address 1: 127.0.0.1 localhost
Address 2: ::1 localhost
nslookup: can't resolve 'http://monitoring-influxdb': Try again
command terminated with exit code 1
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- cat /etc/resolv.conf
nameserver 10.254.0.2
search kube-system.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
kubectl exec -it busybox1 -- nslookup http://monitoring-influxdb
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'http://monitoring-influxdb'
command terminated with exit code 1
kubectl exec -it busybox1 -- cat /etc/resolv.conf
nameserver 10.254.0.2
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
最后,这里是来自 heapster pod 的日志。我在 dns pod 日志中找不到任何错误:
kubectl 记录 heapster-v1.2.0-7657f45c77-65w7w heapster -n kube-system
E0122 09:22:46.966896 1 influxdb.go:217] issues while creating an InfluxDB sink: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp: lookup monitoring-influxdb on 10.254.0.2:53: server misbehaving, will retry on use
任何指针都受到高度赞赏。
编辑:
monitoring-influxdb 位于与 heapster(kube-system)相同的命名空间中。
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- nslookup 监控-influxdb.kube-system
Server: (null)
Address 1: 127.0.0.1 localhost
Address 2: ::1 localhost
nslookup: can't resolve 'monitoring-influxdb.kube-system': Name does not resolve
command terminated with exit code 1
但无论出于何种原因,busybox 都能够解析服务器。
kubectl exec -it busybox1 -- nslookup http://monitoring-influxdb.kube-system
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
Name: monitoring-influxdb.kube-system
Address 1: 10.254.48.109 monitoring-influxdb.kube-system.svc.cluster.local
kubectl -n kube-system 获取 svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
heapster ClusterIP 10.254.193.208 <none> 80/TCP 1h
kube-dns ClusterIP 10.254.0.2 <none> 53/UDP,53/TCP 1h
kubernetes-dashboard NodePort 10.254.89.241 <none> 80:32431/TCP 1h
monitoring-grafana ClusterIP 10.254.176.96 <none> 80/TCP 1h
monitoring-influxdb ClusterIP 10.254.48.109 <none> 8083/TCP,8086/TCP 1h
kubectl -n kube-system 获取 ep
NAME ENDPOINTS AGE
heapster 172.17.29.7:8082 1h
kube-controller-manager <none> 1h
kube-dns 172.17.29.6:53,172.17.29.6:53 1h
kubernetes-dashboard 172.17.29.5:9090 1h
monitoring-grafana 172.17.29.3:3000 1h
monitoring-influxdb 172.17.29.3:8086,172.17.29.3:8083 1h