我有一个自定义操作符,它监听我在 Kubernetes 集群中定义的 CRD 中的变化。
每当定义的自定义资源发生变化时,自定义运算符将协调并幂等地创建一个秘密(将由自定义资源拥有)。
我期望操作员仅在自定义资源或它拥有的秘密发生更改时才进行协调。
我观察到的是,由于某种原因,该函数会以奇怪的间隔Reconcile
触发集群上的每个 CR,而对相关实体没有可观察到的变化。我已经尝试专注于 CR 的特定实例,并遵循Reconcile
需要它的时间。这些调用的间隔很奇怪。电话似乎在两个系列之间交替 - 一个从 10 小时开始,一次减少 7 分钟。另一个从 7 分钟开始,每次增长 7 分钟。
为了演示,在这些时间进行协调triggered
(给或花几秒钟):
00:00
09:53 (10 hours - 1*7 minute interval)
10:00 (0 hours + 1*7 minute interval)
19:46 (10 hours - 2*7 minute interval)
20:00 (0 hours + 2*7 minute interval)
29:39 (10 hours - 3*7 minute interval)
30:00 (0 hours + 3*7 minute interval)
每当递减间隔小于 7 小时时,它会重置为 10 小时间隔。与不断增长的系列相同 - 一旦间隔高于 3 小时,它就会重置回 7 分钟。
我的主要问题是如何调查 Reconcile 被触发的原因?
我在此处附上 CRD 的清单、操作员和 CR 的示例清单:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.4.1
creationTimestamp: "2021-10-13T11:04:42Z"
generation: 1
name: databaseservices.operators.talon.one
resourceVersion: "245688703"
uid: 477f8d3e-c19b-43d7-ab59-65198b3c0108
spec:
conversion:
strategy: None
group: operators.talon.one
names:
kind: DatabaseService
listKind: DatabaseServiceList
plural: databaseservices
singular: databaseservice
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: DatabaseService is the Schema for the databaseservices API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: DatabaseServiceSpec defines the desired state of DatabaseService
properties:
cloud:
type: string
databaseName:
description: Foo is an example field of DatabaseService. Edit databaseservice_types.go
to remove/update
type: string
serviceName:
type: string
servicePlan:
type: string
required:
- cloud
- databaseName
- serviceName
- servicePlan
type: object
status:
description: DatabaseServiceStatus defines the observed state of DatabaseService
type: object
type: object
served: true
storage: true
subresources:
status: {}
status:
acceptedNames:
kind: DatabaseService
listKind: DatabaseServiceList
plural: databaseservices
singular: databaseservice
conditions:
- lastTransitionTime: "2021-10-13T11:04:42Z"
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
- lastTransitionTime: "2021-10-13T11:04:42Z"
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
storedVersions:
- v1alpha1
----
apiVersion: operators.talon.one/v1alpha1
kind: DatabaseService
metadata:
creationTimestamp: "2021-10-13T11:14:08Z"
generation: 1
labels:
app: talon
company: amber
repo: talon-service
name: db-service-secret
namespace: amber
resourceVersion: "245692590"
uid: cc369297-6825-4fbf-aa0b-58c24be427b0
spec:
cloud: google-australia-southeast1
databaseName: amber
serviceName: pg-amber
servicePlan: business-4
----
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "75"
secret.reloader.stakater.com/reload: db-credentials
simpledeployer.talon.one/image: <path_to_image>/production:latest
creationTimestamp: "2020-06-22T09:20:06Z"
generation: 77
labels:
simpledeployer.talon.one/enabled: "true"
name: db-operator
namespace: db-operator
resourceVersion: "245688814"
uid: 900424cd-b469-11ea-b661-4201ac100014
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
name: db-operator
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
name: db-operator
spec:
containers:
- command:
- app/db-operator
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_NAME
value: db-operator
- name: AIVEN_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: db-credentials
- name: AIVEN_PROJECT
valueFrom:
secretKeyRef:
key: projectname
name: db-credentials
- name: AIVEN_USERNAME
valueFrom:
secretKeyRef:
key: username
name: db-credentials
- name: SENTRY_URL
valueFrom:
secretKeyRef:
key: sentry_url
name: db-credentials
- name: ROTATION_INTERVAL
value: monthly
image: <path_to_image>/production@sha256:<some_sha>
imagePullPolicy: Always
name: db-operator
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: db-operator
serviceAccountName: db-operator
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-06-22T09:20:06Z"
lastUpdateTime: "2021-09-07T11:56:07Z"
message: ReplicaSet "db-operator-cb6556b76" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2021-09-12T03:56:19Z"
lastUpdateTime: "2021-09-12T03:56:19Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 77
readyReplicas: 1
replicas: 1
updatedReplicas: 1
笔记:
- Reconcile 完成后,我返回:
return ctrl.Result{Requeue: false, RequeueAfter: 0}
所以这不应该是重复触发的原因。
- 我要补充一点,我最近将 Kubernetes 集群版本更新为 v1.20.8-gke.2101。