我们已经使用带有私有和共享网络的 Terraform 设置了一个 GKE 集群:
网络配置:
resource "google_compute_subnetwork" "int_kube02" {
name = "int-kube02"
region = var.region
project = "infrastructure"
network = "projects/infrastructure/global/networks/net-10-23-0-0-16"
ip_cidr_range = "10.23.5.0/24"
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.60.0.0/14" # 10.60 - 10.63
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.56.0.0/16"
}
}
集群配置:
resource "google_container_cluster" "gke_kube02" {
name = "kube02"
location = var.region
initial_node_count = var.gke_kube02_num_nodes
network = "projects/ninfrastructure/global/networks/net-10-23-0-0-16"
subnetwork = "projects/infrastructure/regions/europe-west3/subnetworks/int-kube02"
master_authorized_networks_config {
cidr_blocks {
display_name = "admin vpn"
cidr_block = "10.42.255.0/24"
}
cidr_blocks {
display_name = "monitoring server"
cidr_block = "10.42.4.33/32"
}
cidr_blocks {
display_name = "cluster nodes"
cidr_block = "10.23.5.0/24"
}
}
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = true
master_ipv4_cidr_block = "192.168.23.0/28"
}
node_config {
machine_type = "e2-highcpu-2"
tags = ["kube-no-external-ip"]
metadata = {
disable-legacy-endpoints = true
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
集群在线并且运行良好。如果我连接到其中一个工作节点,我可以使用以下方式访问 api curl
:
curl -k https://192.168.23.2
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
使用 SSH 端口转发时,我还看到了一个健康的集群:
❯ k get pods --all-namespaces --insecure-skip-tls-verify=true
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system event-exporter-gke-5479fd58c8-mv24r 2/2 Running 0 4h44m
kube-system fluentbit-gke-ckkwh 2/2 Running 0 4h44m
kube-system fluentbit-gke-lblkz 2/2 Running 0 4h44m
kube-system fluentbit-gke-zglv2 2/2 Running 4 4h44m
kube-system gke-metrics-agent-j72d9 1/1 Running 0 4h44m
kube-system gke-metrics-agent-ttrzk 1/1 Running 0 4h44m
kube-system gke-metrics-agent-wbqgc 1/1 Running 0 4h44m
kube-system kube-dns-697dc8fc8b-rbf5b 4/4 Running 5 4h44m
kube-system kube-dns-697dc8fc8b-vnqb4 4/4 Running 1 4h44m
kube-system kube-dns-autoscaler-844c9d9448-f6sqw 1/1 Running 0 4h44m
kube-system kube-proxy-gke-kube02-default-pool-2bf58182-xgp7 1/1 Running 0 4h43m
kube-system kube-proxy-gke-kube02-default-pool-707f5d51-s4xw 1/1 Running 0 4h43m
kube-system kube-proxy-gke-kube02-default-pool-bd2c130d-c67h 1/1 Running 0 4h43m
kube-system l7-default-backend-6654b9bccb-mw6bp 1/1 Running 0 4h44m
kube-system metrics-server-v0.4.4-857776bc9c-sq9kd 2/2 Running 0 4h43m
kube-system pdcsi-node-5zlb7 2/2 Running 0 4h44m
kube-system pdcsi-node-kn2zb 2/2 Running 0 4h44m
kube-system pdcsi-node-swhp9 2/2 Running 0 4h44m
到目前为止,一切都很好。然后我设置 Cloud Router 以宣布192.168.23.0/28
网络。这很成功,并使用 BGP 复制到我们的本地站点。运行show route 192.168.23.2
显示正确的路由被通告和安装。
当尝试从监控服务器访问 API 时,10.42.4.33
我遇到了超时。Cloud VPN、Cloud Router 和 Kubernetes 集群这三者都运行在europe-west3
.
当我尝试 ping 一名工作人员时,它的工作完全正常,因此一般网络工作:
[me@monitoring ~]$ ping 10.23.5.216
PING 10.23.5.216 (10.23.5.216) 56(84) bytes of data.
64 bytes from 10.23.5.216: icmp_seq=1 ttl=63 time=8.21 ms
64 bytes from 10.23.5.216: icmp_seq=2 ttl=63 time=7.70 ms
64 bytes from 10.23.5.216: icmp_seq=3 ttl=63 time=5.41 ms
64 bytes from 10.23.5.216: icmp_seq=4 ttl=63 time=7.98 ms
谷歌文档没有给出任何可能丢失的信息。据我了解,集群 API 现在应该可以访问了。
有谁知道可能缺少什么以及为什么无法通过 VPN 访问 API?
非常感谢你的帮助!