2

我们已经使用带有私有和共享网络的 Terraform 设置了一个 GKE 集群:

网络配置:

resource "google_compute_subnetwork" "int_kube02" {
  name          = "int-kube02"
  region        = var.region
  project       = "infrastructure"
  network       = "projects/infrastructure/global/networks/net-10-23-0-0-16"
  ip_cidr_range = "10.23.5.0/24"
  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = "10.60.0.0/14" # 10.60 - 10.63
  }
  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = "10.56.0.0/16"
  }
}

集群配置:

resource "google_container_cluster" "gke_kube02" {
  name     = "kube02"
  location = var.region

  initial_node_count = var.gke_kube02_num_nodes

  network    = "projects/ninfrastructure/global/networks/net-10-23-0-0-16"
  subnetwork = "projects/infrastructure/regions/europe-west3/subnetworks/int-kube02"

  master_authorized_networks_config {
    cidr_blocks {
      display_name = "admin vpn"
      cidr_block   = "10.42.255.0/24"
    }
    cidr_blocks {
      display_name = "monitoring server"
      cidr_block   = "10.42.4.33/32"
    }
    cidr_blocks {
      display_name = "cluster nodes"
      cidr_block   = "10.23.5.0/24"
    }
  }

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = true
    master_ipv4_cidr_block  = "192.168.23.0/28"


  }

  node_config {
    machine_type = "e2-highcpu-2"

    tags = ["kube-no-external-ip"]
    metadata = {
      disable-legacy-endpoints = true
    }

    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }
}

集群在线并且运行良好。如果我连接到其中一个工作节点,我可以使用以下方式访问 api curl

curl -k https://192.168.23.2
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}

使用 SSH 端口转发时,我还看到了一个健康的集群:

❯ k get pods --all-namespaces --insecure-skip-tls-verify=true
NAMESPACE     NAME                                               READY   STATUS    RESTARTS   AGE
kube-system   event-exporter-gke-5479fd58c8-mv24r                2/2     Running   0          4h44m
kube-system   fluentbit-gke-ckkwh                                2/2     Running   0          4h44m
kube-system   fluentbit-gke-lblkz                                2/2     Running   0          4h44m
kube-system   fluentbit-gke-zglv2                                2/2     Running   4          4h44m
kube-system   gke-metrics-agent-j72d9                            1/1     Running   0          4h44m
kube-system   gke-metrics-agent-ttrzk                            1/1     Running   0          4h44m
kube-system   gke-metrics-agent-wbqgc                            1/1     Running   0          4h44m
kube-system   kube-dns-697dc8fc8b-rbf5b                          4/4     Running   5          4h44m
kube-system   kube-dns-697dc8fc8b-vnqb4                          4/4     Running   1          4h44m
kube-system   kube-dns-autoscaler-844c9d9448-f6sqw               1/1     Running   0          4h44m
kube-system   kube-proxy-gke-kube02-default-pool-2bf58182-xgp7   1/1     Running   0          4h43m
kube-system   kube-proxy-gke-kube02-default-pool-707f5d51-s4xw   1/1     Running   0          4h43m
kube-system   kube-proxy-gke-kube02-default-pool-bd2c130d-c67h   1/1     Running   0          4h43m
kube-system   l7-default-backend-6654b9bccb-mw6bp                1/1     Running   0          4h44m
kube-system   metrics-server-v0.4.4-857776bc9c-sq9kd             2/2     Running   0          4h43m
kube-system   pdcsi-node-5zlb7                                   2/2     Running   0          4h44m
kube-system   pdcsi-node-kn2zb                                   2/2     Running   0          4h44m
kube-system   pdcsi-node-swhp9                                   2/2     Running   0          4h44m

到目前为止,一切都很好。然后我设置 Cloud Router 以宣布192.168.23.0/28网络。这很成功,并使用 BGP 复制到我们的本地站点。运行show route 192.168.23.2显示正确的路由被通告和安装。

当尝试从监控服务器访问 API 时,10.42.4.33我遇到了超时。Cloud VPN、Cloud Router 和 Kubernetes 集群这三者都运行在europe-west3.

当我尝试 ping 一名工作人员时,它的工作完全正常,因此一般网络工作:

[me@monitoring ~]$ ping 10.23.5.216
PING 10.23.5.216 (10.23.5.216) 56(84) bytes of data.
64 bytes from 10.23.5.216: icmp_seq=1 ttl=63 time=8.21 ms
64 bytes from 10.23.5.216: icmp_seq=2 ttl=63 time=7.70 ms
64 bytes from 10.23.5.216: icmp_seq=3 ttl=63 time=5.41 ms
64 bytes from 10.23.5.216: icmp_seq=4 ttl=63 time=7.98 ms

谷歌文档没有给出任何可能丢失的信息。据我了解,集群 API 现在应该可以访问了。

有谁知道可能缺少什么以及为什么无法通过 VPN 访问 API?

非常感谢你的帮助!

4

1 回答 1

3

我错过了此处记录的对等配置: https ://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#cp-on-prem-routing

resource "google_compute_network_peering_routes_config" "peer_kube02" {
  peering = google_container_cluster.gke_kube02.private_cluster_config[0].peering_name
  project = "infrastructure"
  network = "net-10-13-0-0-16"

  export_custom_routes = true
  import_custom_routes = false
}

于 2022-02-10T15:52:24.227 回答