问题
Terraform GCP google_service_account 和 google_project_iam_binding 资源,用于roles/editor
在 IAM 主体中附加已删除的 Google API 服务代理和 GCP 默认计算引擎默认服务帐户。由于 IAM 主体中的删除,无法删除/创建 GKE 集群,尽管它仍保留在 IAM 服务账户中。
这里的问题是它从 IAM 主体中消失了(我写了“已删除”),并且 Compute Engine 默认服务帐户遭到破坏,因此不再能够管理 Compute Engine,包括 GKE 集群/节点。
问题
我相信这是一个 Terraform 错误,但请帮助了解我是否遗漏了可以防止该问题的东西。
还请告知是否有办法将 Compute Engine 默认服务帐户恢复到具有 Editor 角色的 IAM 委托人中。
环境
$ terraform version
Terraform v1.0.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/google v4.6.0
.terraform.lock.hcl
# This file is maintained automatically by "terraform init".
# Manual edits may be lost in future updates.
provider "registry.terraform.io/hashicorp/google" {
version = "4.6.0"
hashes = [
"h1:QbO4yjDrnoSpiYKSHrICNL1ZuWsl5J2rVRFj2kNg7xA=",
"zh:005a28a2c79f6b29680b0f57260c69c85d8a992688007b6e5645149bd379951f",
"zh:2604d825de72cf99b4899d7880837adeb19d371f48e419666e32c4c3cf6a72e9",
"zh:290da4eb18e44469480cf299bebce89f54e4d301f856cdffe2837b498878c7ec",
"zh:3e5ba1a55d38fa17533a18fc14a612e781ded76c6309734d3dc0a937be27eec1",
"zh:4a85de3cdb33c092d8ccfced3d7302934de0dd4f72bbcebd79d45afe0a0b6f85",
"zh:5fb1a79800833ae922aaba594a8b2bc83be1d254052e12e0ce8330ca0d8933d9",
"zh:679b9f50c6fe0476e74d37935f7598d46d6e9612f75b26a8ef1ca3c13144d06a",
"zh:893216e32378839668c51ef135af1676cd887d63e2edb6625cf9adad7bfa346f",
"zh:ad8f2fd19adbe4c10281ba9b3c8d5100877a9c541d3580bbbe9357714aa77619",
"zh:bff5d6fd15e98c12ee9ed98b0338761dc4a9ba671a37834926daeabf73c71783",
"zh:debdf15fbed8d63e397cd004bf65586bd2b93ce04e47ca51a7c70c1fe9168b87",
]
}
复制步骤
在不同的 GCP 项目中测试了两次,问题以相同的方式重现。
开始
在 GCP 项目中,启动时未启用 Compute Engine,因此没有 Compute Engine 默认服务帐号。
启用计算引擎 API。
Compute Engine 默认服务帐号已创建并出现在 IAM 委托人和 IAM 服务帐号中。
Terraform 应用
应用 terraform 脚本以创建具有 IAM 绑定的服务账户。
variable "PROJECT_ID" {
type = string
description = "GCP Project ID"
default = "test-tf-sa"
}
variable "REGION" {
type = string
description = "GCP Region"
default = "us-central1"
}
variable "roles_to_grant_to_service_account" {
description = "IAM roles to grant to the service account"
type = list(string)
default = [
"roles/editor",
"roles/iam.serviceAccountAdmin",
"roles/resourcemanager.projectIamAdmin"
]
}
provider "google" {
project = var.PROJECT_ID
region = var.REGION
}
resource "google_service_account" "terraform" {
account_id = "terraform"
display_name = "terraform service account"
}
resource "google_project_iam_binding" "terraform" {
project = var.PROJECT_ID
#--------------------------------------------------------------------------------
# Grant the service account to have the roles
#--------------------------------------------------------------------------------
members = [
"serviceAccount:${google_service_account.terraform.email}"
]
for_each = toset(var.roles_to_grant_to_service_account)
role = each.value
}
$ terraform apply --auto-approve
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_project_iam_binding.terraform["roles/editor"] will be created
+ resource "google_project_iam_binding" "terraform" {
+ etag = (known after apply)
+ id = (known after apply)
+ members = (known after apply)
+ project = "test-tf-sa"
+ role = "roles/editor"
}
# google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"] will be created
+ resource "google_project_iam_binding" "terraform" {
+ etag = (known after apply)
+ id = (known after apply)
+ members = (known after apply)
+ project = "test-tf-sa"
+ role = "roles/iam.serviceAccountAdmin"
}
# google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"] will be created
+ resource "google_project_iam_binding" "terraform" {
+ etag = (known after apply)
+ id = (known after apply)
+ members = (known after apply)
+ project = "test-tf-sa"
+ role = "roles/resourcemanager.projectIamAdmin"
}
# google_service_account.terraform will be created
+ resource "google_service_account" "terraform" {
+ account_id = "terraform"
+ disabled = false
+ display_name = "terraform service account"
+ email = (known after apply)
+ id = (known after apply)
+ name = (known after apply)
+ project = (known after apply)
+ unique_id = (known after apply)
}
Plan: 4 to add, 0 to change, 0 to destroy.
google_service_account.terraform: Creating...
google_service_account.terraform: Creation complete after 2s [id=projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Creating...
google_project_iam_binding.terraform["roles/editor"]: Creating...
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Creating...
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Creation complete after 9s [id=test-tf-sa/roles/iam.serviceAccountAdmin]
google_project_iam_binding.terraform["roles/editor"]: Creation complete after 9s [id=test-tf-sa/roles/editor]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Still creating... [10s elapsed]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Creation complete after 10s [id=test-tf-sa/roles/resourcemanager.projectIamAdmin]
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Terraform已从 IAM 委托人中删除了 Compute Engine 默认服务帐号
应用 terraform 后,立即验证 IAM 委托人,并且已在 IAM 委托人视图中删除了 Compute Engine 默认服务帐号。
正如@JohnHanley 所建议的那样,单击包括 Google 提供的角色授予以取消隐藏 Google 管理的服务帐户。原始 Compute Engine 默认服务帐号1079157603081-compute@developer.gserviceaccount.com已进入 IAM 委托人视图。
该gcloud projects get-iam-policy
命令不显示 Compute Engine 默认服务帐号1079157603081-compute@developer.gserviceaccount.com。
$ GCP_PROJECT_ID=test-tf-sa
$ gcloud projects get-iam-policy $GCP_PROJECT_ID
bindings:
- members:
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/compute.admin
- members:
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/compute.instanceAdmin
- members:
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/compute.serviceAgent
- members:
- serviceAccount:service-1079157603081@container-engine-robot.iam.gserviceaccount.com
role: roles/container.serviceAgent
- members:
- serviceAccount:service-1079157603081@containerregistry.iam.gserviceaccount.com
role: roles/containerregistry.ServiceAgent
- members:
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/editor
- members:
- user:****@gmail.com
role: roles/owner
- members:
- serviceAccount:service-1079157603081@gcp-sa-pubsub.iam.gserviceaccount.com
role: roles/pubsub.serviceAgent
etag: BwXVf2S5fCQ=
version: 1
服务帐户仍保留在 IAM 服务帐户菜单中。
创建 GKE
启用 Kubernetes Engine API,并创建 GKE 集群。此时,Compute Engine 默认服务帐号的影响并没有阻碍 GKE 的创建。这可能是因为最终的一致性。
地形破坏
运行 terraform 销毁。
$ terraform destroy --auto-approve
google_service_account.terraform: Refreshing state... [id=projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com]
google_project_iam_binding.terraform["roles/editor"]: Refreshing state... [id=test-tf-sa/roles/editor]
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Refreshing state... [id=test-tf-sa/roles/iam.serviceAccountAdmin]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Refreshing state... [id=test-tf-sa/roles/resourcemanager.projectIamAdmin]
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply":
# google_project_iam_binding.terraform["roles/editor"] has been changed
~ resource "google_project_iam_binding" "terraform" {
~ etag = "BwXVe+z+aCU=" -> "BwXVfBieTDw="
id = "test-tf-sa/roles/editor"
~ members = [
+ "serviceAccount:1079157603081@cloudservices.gserviceaccount.com",
# (1 unchanged element hidden)
]
# (2 unchanged attributes hidden)
}
# google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"] has been changed
~ resource "google_project_iam_binding" "terraform" {
~ etag = "BwXVe+z+aCU=" -> "BwXVfBieTDw="
id = "test-tf-sa/roles/iam.serviceAccountAdmin"
# (3 unchanged attributes hidden)
}
# google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"] has been changed
~ resource "google_project_iam_binding" "terraform" {
~ etag = "BwXVe+z+aCU=" -> "BwXVfBieTDw="
id = "test-tf-sa/roles/resourcemanager.projectIamAdmin"
# (3 unchanged attributes hidden)
}
Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to
undo or respond to these changes.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
- destroy
Terraform will perform the following actions:
# google_project_iam_binding.terraform["roles/editor"] will be destroyed
- resource "google_project_iam_binding" "terraform" {
- etag = "BwXVfBieTDw=" -> null
- id = "test-tf-sa/roles/editor" -> null
- members = [
- "serviceAccount:1079157603081@cloudservices.gserviceaccount.com",
- "serviceAccount:terraform@test-tf-sa.iam.gserviceaccount.com",
] -> null
- project = "test-tf-sa" -> null
- role = "roles/editor" -> null
}
# google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"] will be destroyed
- resource "google_project_iam_binding" "terraform" {
- etag = "BwXVfBieTDw=" -> null
- id = "test-tf-sa/roles/iam.serviceAccountAdmin" -> null
- members = [
- "serviceAccount:terraform@test-tf-sa.iam.gserviceaccount.com",
] -> null
- project = "test-tf-sa" -> null
- role = "roles/iam.serviceAccountAdmin" -> null
}
# google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"] will be destroyed
- resource "google_project_iam_binding" "terraform" {
- etag = "BwXVfBieTDw=" -> null
- id = "test-tf-sa/roles/resourcemanager.projectIamAdmin" -> null
- members = [
- "serviceAccount:terraform@test-tf-sa.iam.gserviceaccount.com",
] -> null
- project = "test-tf-sa" -> null
- role = "roles/resourcemanager.projectIamAdmin" -> null
}
# google_service_account.terraform will be destroyed
- resource "google_service_account" "terraform" {
- account_id = "terraform" -> null
- disabled = false -> null
- display_name = "terraform service account" -> null
- email = "terraform@test-tf-sa.iam.gserviceaccount.com" -> null
- id = "projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com" -> null
- name = "projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com" -> null
- project = "test-tf-sa" -> null
- unique_id = "107173424725895843752" -> null
}
Plan: 0 to add, 0 to change, 4 to destroy.
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Destroying... [id=test-tf-sa/roles/resourcemanager.projectIamAdmin]
google_project_iam_binding.terraform["roles/editor"]: Destroying... [id=test-tf-sa/roles/editor]
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Destroying... [id=test-tf-sa/roles/iam.serviceAccountAdmin]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Destruction complete after 10s
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Destruction complete after 10s
google_project_iam_binding.terraform["roles/editor"]: Still destroying... [id=test-tf-sa/roles/editor, 10s elapsed]
google_project_iam_binding.terraform["roles/editor"]: Destruction complete after 11s
google_service_account.terraform: Destroying... [id=projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com]
google_service_account.terraform: Destruction complete after 1s
Destroy complete! Resources: 4 destroyed.
问题
无法删除 GKE
删除 IAM 委托人中的 Compute Engine 默认服务帐号的影响已开始。
无法删除 GKE 集群并出现错误。
Google Compute Engine:“projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp”需要“compute.instanceGroups.update”权限。
$ gcloud container clusters delete cluster-1 --zone=us-central1-c
The following clusters will be deleted.
- [cluster-1] in [us-central1-c]
Do you want to continue (Y/n)? Y
Deleting cluster cluster-1...done.
ERROR: (gcloud.container.clusters.delete) Some requests did not succeed:
- args: ['Operation [<Operation\n clusterConditions: [<StatusCondition\n canonicalCode: CanonicalCodeValueValuesEnum(PERMISSION_DENIED, 7)\n message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">]\n detail: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n endTime: \'2022-01-14T00:20:54.190004708Z\'\n error: <Status\n code: 7\n details: []\n message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">\n name: \'operation-1642119632548-20038ec5\'\n nodepoolConditions: []\n operationType: OperationTypeValueValuesEnum(DELETE_CLUSTER, 2)\n selfLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/operations/operation-1642119632548-20038ec5\'\n startTime: \'2022-01-14T00:20:32.548792723Z\'\n status: StatusValueValuesEnum(DONE, 3)\n statusMessage: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n targetLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/clusters/cluster-1\'\n zone: \'us-central1-c\'>] finished with error: Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.']
exit_code: 1
无法创建 GKE
尝试创建另一个 GKE 集群。
无法再创建 GKE 集群。这是原始问题GCP GKE - Google Compute Engine:并非所有在 IGM 中运行的实例都遇到了导致此故障排除的问题。
集群 2
Google Compute Engine:并非所有实例都在 18.798524988 秒后在 IGM 中运行。预期 3,运行 0,过渡 3。当前错误:[PERMISSIONS_ERROR]:实例“gke-cluster-2-default-pool-36522bb7-0vkl”创建失败:“projects/1079157603081/”需要“compute.instances.create”权限zone/us-central1-c/instances/gke-cluster-2-default-pool-36522bb7-0vkl'(当作为'1079157603081@cloudservices.gserviceaccount.com'时);[PERMISSIONS_ERROR]:实例“gke-cluster-2-default-pool-36522bb7-0vkl”创建失败:“projects/1079157603081/zones/us-central1-c/disks/gke-”所需的“compute.disks.create”权限cluster-2-default-pool-36522bb7-0vkl'(当充当'1079157603081@cloudservices.gserviceaccount.com'时);[PERMISSIONS_ERROR]:实例' gke-cluster-2-default-pool-36522bb7-0vkl' 创建失败:'projects/1079157603081/zones/us-central1-c/disks/gke-cluster-2-default- 所需的 'compute.disks.setLabels' 权限pool-36522bb7-0vkl'(当充当'1079157603081@cloudservices.gserviceaccount.com'时);[PERMISSIONS_ERROR]:实例“gke-cluster-2-default-pool-36522bb7-0vkl”创建失败:“projects/1079157603081/regions/us-central1/subnetworks/default”所需的“compute.subnetworks.use”权限(当充当“1079157603081@cloudservices.gserviceaccount.com”);[PERMISSIONS_ERROR]:实例“gke-cluster-2-default-pool-36522bb7-0vkl”创建失败:“projects/1079157603081/regions/us-central1/subnetworks/default”所需的“compute.subnetworks.useExternalIp”权限
尝试修复
尝试了这些措施,但没有运气。
将角色/编辑重新分配给服务帐号
GCP_PROJECT_ID=test-tf-sa
GCP_SVC_ACC="serviceAccount:1079157603081-compute@developer.gserviceaccount.com"
gcloud projects add-iam-policy-binding ${GCP_PROJECT_ID} \
--member=serviceAccount:${GCP_SVC_ACC} \
--role=roles/Editor
-----
ERROR: Policy modification failed. For a binding with condition, run "gcloud alpha iam policies lint-condition" to identify issues in condition.
ERROR: (gcloud.projects.add-iam-policy-binding) INVALID_ARGUMENT: Role roles/Editor is not supported for this resource.
应用取消删除服务帐号
$ gcloud beta iam service-accounts undelete 109558708367309276392
restoredAccount:
email: 1079157603081-compute@developer.gserviceaccount.com
etag: MDEwMjE5MjA=
name: projects/test-tf-sa/serviceAccounts/1079157603081-compute@developer.gserviceaccount.com
oauth2ClientId: '109558708367309276392'
projectId: test-tf-sa
uniqueId: '109558708367309276392'
他们没有将 Compute Engine 默认服务帐号带回 IAM 委托人。
禁用 Compute Engine API
尝试禁用 Compute Engine API,但由于无法删除 GKE 节点,因此无法禁用。
手动添加服务帐号
手动添加了Compute Engine账号1079157603081-compute@developer.gserviceaccount.com”,添加了IAM角色/编辑器,命令输出中没有出现gcloud projects get-iam-policy
,但是仍然无法删除GKE集群。
$ gcloud projects get-iam-policy $GCP_PROJECT_ID
bindings:
...
- members:
- serviceAccount:1079157603081-compute@developer.gserviceaccount.com <-----
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/editor
...
etag: BwXVf9cVnaU=
version: 1
$ gcloud container clusters delete cluster-1 --zone=us-central1-c
The following clusters will be deleted.
- [cluster-1] in [us-central1-c]
Do you want to continue (Y/n)? Y
Deleting cluster cluster-1...done.
ERROR: (gcloud.container.clusters.delete) Some requests did not succeed:
- args: ['Operation [<Operation\n clusterConditions: [<StatusCondition\n canonicalCode: CanonicalCodeValueValuesEnum(PERMISSION_DENIED, 7)\n
message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">]\n
detail: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n
endTime: \'2022-01-14T00:33:38.746564953Z\'\n error: <Status\n code: 7\n details: []\n
message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">\n
name: \'operation-1642120382096-034b0eb7\'\n nodepoolConditions: []
\n operationType: OperationTypeValueValuesEnum(DELETE_CLUSTER, 2)\n
selfLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/operations/operation-1642120382096-034b0eb7\'\n
startTime: \'2022-01-14T00:33:02.096736326Z\'\n status: StatusValueValuesEnum(DONE, 3)\n
statusMessage: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n
targetLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/clusters/cluster-1\'\n
zone: \'us-central1-c\'>] finished with error: Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.']
exit_code: 1
GKE 的另一个服务帐号
创建了另一个具有 compute.admin 角色的服务帐户,并使用它来创建/删除 GKE 集群。但是,一旦 Compute Engine 默认服务帐户遭到破坏,请继续使用GCP GKE - Google Compute Engine:并非所有实例都在 IGM 中运行问题。
达到的目标
将 Compute Engine 默认服务帐号重新加入 IAM 主体,如下面的快照所示,并能够管理 Compute Engine 和 GKE 节点。