kubernetes - Why does Ingress fail when LoadBalancer works on GKE?

Question

I can't get Ingress to work on GKE, owing to health check failures. I've tried all of the debugging steps I can think of, including:

Verified I'm not running low on any quotas
Verified that my service is accessible from within the cluster
Verified that my service works behind a k8s/GKE Load Balancer.
Verified that healthz checks are passing in Stackdriver logs

... I'd love any advice about how to debug or fix. Details below!

I have set up a service with type LoadBalancer on GKE. Works great via external IP:

apiVersion: v1
kind: Service
metadata:
  name: echoserver
  namespace: es
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
  type: LoadBalancer
  selector:
    app: echoserver

Then I try setting up an Ingress on top of this same service:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: echoserver-ingress
  namespace: es
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.global-static-ip-name: "echoserver-global-ip"
spec:
  backend:
    serviceName: echoserver
    servicePort: 80

The Ingress gets created, but it thinks the backend nodes are unhealthy:

$ kubectl --namespace es describe ingress echoserver-ingress | grep backends
  backends:     {"k8s-be-31102--<snipped>":"UNHEALTHY"}

Inspecting the state of the Ingress backend in the GKE web console, I see the same thing:

The health check details appear as expected:

... and from within a pod in my cluster I can call the service successfully:

# curl  -vvv echoserver  2>&1 | grep "< HTTP"
< HTTP/1.0 200 OK

# curl  -vvv echoserver/healthz  2>&1 | grep "< HTTP"
< HTTP/1.0 200 OK

And I can address the service by NodePort:

# curl  -vvv 10.0.1.1:31102  2>&1 | grep "< HTTP" 
< HTTP/1.0 200 OK

(Which goes without saying, because the Load Balancer service I set up in step 1 resulted in a web site that's working just fine.)

I also see healthz checks passing in Stackdriver logs:

Regarding quotas, I check and see I'm only using 3 of 30 backend services:

$ gcloud compute project-info describe | grep -A 1 -B 1  BACKEND_SERVICES
- limit: 30.0
  metric: BACKEND_SERVICES
  usage: 3.0

score 2 · Accepted Answer

几周前有一个类似的问题。为我解决的问题是在服务描述中添加一个 NodePort，以便 Google Cloud Loadbalancer 可以探测这个 NodePort。对我有用的配置：

apiVersion: v1
kind: Service
metadata: 
  name: some-service
spec: 
  selector: 
    name: some-app
  type: NodePort
  ports: 
    - port: 80
      targetPort: 8080
      nodePort: 32000
      protocol: TCP

入口可能需要一些时间才能接收到这一点。您可以重新创建入口以加快速度。

score 0 · Accepted Answer

You have configured the timeout value to be 1 second. Perhaps increasing it to 5 seconds will solve the issue.

score 0 · Accepted Answer

我遇到了这个问题，最终遇到了https://stackoverflow.com/a/50645953/9276，这让我查看了我的防火墙设置。果然，我添加的最后几个 NodePort 服务没有在防火墙规则中启用，因此指向它们的入口的健康检查都失败了。手动将新主机端口添加到防火墙规则为我解决了这个问题。

但是，与链接的答案不同，我没有使用无效的证书。我猜还有其他错误或奇怪的状态会导致这种行为，但我还没有找到规则停止自动管理的原因。

可能不相关，我在我们的 qa 环境中没有这个问题，只是生产，所以可能有 GCP 项目级别的设置在起作用。

kubernetes - Why does Ingress fail when LoadBalancer works on GKE?

3 回答 3

Related

Reference