我在 GCP 上创建了一个 Kubernetes 集群(GKE),并尝试在此安装 Kafka(参考链接 - https://snourian.com/kafka-kubernetes-strimzi-part-1-creating-deploying-strimzi-kafka / )
我部署 kafka 集群时 Zookeeper 没有启动:
karan@cloudshell:~/strimzi-0.26.0 (versa-kafka-poc)$ kubectl get pv,pvc,pods -n kafka
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-96957b25-f49b-4598-869c-a73b32325bc7 2Gi RWO Delete Bound kafka/data-my-cluster-zookeeper-0 standard 6m17s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-my-cluster-zookeeper-0 Bound pvc-96957b25-f49b-4598-869c-a73b32325bc7 2Gi RWO standard 6m20s
NAME READY STATUS RESTARTS AGE
pod/my-cluster-zookeeper-0 0/1 Pending 0 6m18s
pod/strimzi-cluster-operator-85bb4c6-cfl4p 1/1 Running 0 8m29s
aran@cloudshell:~/strimzi-0.26.0 (versa-kafka-poc)$ kc describe pod my-cluster-zookeeper-0 -n kafka
Name: my-cluster-zookeeper-0
Namespace: kafka
Priority: 0
Node: <none>
Labels: app.kubernetes.io/instance=my-cluster
app.kubernetes.io/managed-by=strimzi-cluster-operator
app.kubernetes.io/name=zookeeper
app.kubernetes.io/part-of=strimzi-my-cluster
controller-revision-hash=my-cluster-zookeeper-867c478fc4
statefulset.kubernetes.io/pod-name=my-cluster-zookeeper-0
strimzi.io/cluster=my-cluster
strimzi.io/kind=Kafka
strimzi.io/name=my-cluster-zookeeper
Annotations: strimzi.io/cluster-ca-cert-generation: 0
strimzi.io/generation: 0
strimzi.io/logging-hash: 0f057cb0003c78f02978b83e4fabad5bd508680c
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/my-cluster-zookeeper
Containers:
zookeeper:
Image: quay.io/strimzi/kafka:0.26.0-kafka-3.0.0
Ports: 2888/TCP, 3888/TCP, 2181/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/opt/kafka/zookeeper_run.sh
Limits:
cpu: 1500m
memory: 2Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Environment:
ZOOKEEPER_METRICS_ENABLED: false
ZOOKEEPER_SNAPSHOT_CHECK_ENABLED: true
STRIMZI_KAFKA_GC_LOG_ENABLED: false
DYNAMIC_HEAP_FRACTION: 0.75
DYNAMIC_HEAP_MAX: 2147483648
ZOOKEEPER_CONFIGURATION: tickTime=2000
initLimit=5
syncLimit=2
autopurge.purgeInterval=1
Mounts:
/opt/kafka/cluster-ca-certs/ from cluster-ca-certs (rw)
/opt/kafka/custom-config/ from zookeeper-metrics-and-logging (rw)
/opt/kafka/zookeeper-node-certs/ from zookeeper-nodes (rw)
/tmp from strimzi-tmp (rw)
/var/lib/zookeeper from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cgm22 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-my-cluster-zookeeper-0
ReadOnly: false
strimzi-tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Mi
zookeeper-metrics-and-logging:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-cluster-zookeeper-config
Optional: false
zookeeper-nodes:
Type: Secret (a volume populated by a Secret)
SecretName: my-cluster-zookeeper-nodes
Optional: false
cluster-ca-certs:
Type: Secret (a volume populated by a Secret)
SecretName: my-cluster-cluster-ca-cert
Optional: false
kube-api-access-cgm22:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 10m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 40s (x10 over 10m) default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
Normal NotTriggerScaleUp 37s (x61 over 10m) cluster-autoscaler pod didn't trigger scale-up:
这是用于创建集群的 yaml 文件:
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster #1
spec:
kafka:
version: 3.0.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
log.message.format.version: "3.0"
inter.broker.protocol.version: "3.0"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 2Gi
deleteClaim: false
logging: #9
type: inline
loggers:
kafka.root.logger.level: "INFO"
zookeeper:
replicas: 1
storage:
type: persistent-claim
size: 2Gi
deleteClaim: false
resources:
requests:
memory: 1Gi
cpu: "1"
limits:
memory: 2Gi
cpu: "1.5"
logging:
type: inline
loggers:
zookeeper.root.logger: "INFO"
entityOperator: #11
topicOperator: {}
userOperator: {}
PersistentVolume 显示为绑定到 PersistentVolumeClaim,但是 zookeeper 没有启动说节点 CPU 不足。
关于需要做什么的任何指针?
cpu in 2 of the 3 nodes have limit - 0%
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 483m (51%) 0 (0%)
memory 410Mi (14%) 890Mi (31%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Edit cancelled, no changes made. 0
3rd node :
Resource Requests Limits
-------- -------- ------
cpu 511m (54%) 1143m (121%)
memory 868783744 (29%) 1419Mi (50%)
kc 描述 pod my-cluster-zookeeper-0 -n kafka
karan@cloudshell:~ (versa-kafka-poc)$ kc describe pod my-cluster-zookeeper-0 -n kafka
Name: my-cluster-zookeeper-0
Namespace: kafka
Priority: 0
Node: <none>
Labels: app.kubernetes.io/instance=my-cluster
app.kubernetes.io/managed-by=strimzi-cluster-operator
app.kubernetes.io/name=zookeeper
app.kubernetes.io/part-of=strimzi-my-cluster
controller-revision-hash=my-cluster-zookeeper-867c478fc4
statefulset.kubernetes.io/pod-name=my-cluster-zookeeper-0
strimzi.io/cluster=my-cluster
strimzi.io/kind=Kafka
strimzi.io/name=my-cluster-zookeeper
Annotations: strimzi.io/cluster-ca-cert-generation: 0
strimzi.io/generation: 0
strimzi.io/logging-hash: 0f057cb0003c78f02978b83e4fabad5bd508680c
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/my-cluster-zookeeper
Containers:
zookeeper:
Image: quay.io/strimzi/kafka:0.26.0-kafka-3.0.0
Ports: 2888/TCP, 3888/TCP, 2181/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/opt/kafka/zookeeper_run.sh
Limits:
cpu: 1500m
memory: 2Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Environment:
ZOOKEEPER_METRICS_ENABLED: false
ZOOKEEPER_SNAPSHOT_CHECK_ENABLED: true
STRIMZI_KAFKA_GC_LOG_ENABLED: false
DYNAMIC_HEAP_FRACTION: 0.75
DYNAMIC_HEAP_MAX: 2147483648
ZOOKEEPER_CONFIGURATION: tickTime=2000
initLimit=5
syncLimit=2
autopurge.purgeInterval=1
Mounts:
/opt/kafka/cluster-ca-certs/ from cluster-ca-certs (rw)
/opt/kafka/custom-config/ from zookeeper-metrics-and-logging (rw)
/opt/kafka/zookeeper-node-certs/ from zookeeper-nodes (rw)
/tmp from strimzi-tmp (rw)
/var/lib/zookeeper from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cgm22 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-my-cluster-zookeeper-0
ReadOnly: false
strimzi-tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Mi
zookeeper-metrics-and-logging:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-cluster-zookeeper-config
Optional: false
zookeeper-nodes:
Type: Secret (a volume populated by a Secret)
SecretName: my-cluster-zookeeper-nodes
Optional: false
cluster-ca-certs:
Type: Secret (a volume populated by a Secret)
SecretName: my-cluster-cluster-ca-cert
Optional: false
kube-api-access-cgm22:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5h27m default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
Normal NotTriggerScaleUp 28m (x1771 over 5h26m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added):
Normal NotTriggerScaleUp 4m17s (x91 over 19m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 max node group size reached
Warning FailedScheduling 80s (x19 over 20m) default-scheduler 0/3 nodes are available: 3 Insufficient cpu.