elasticsearch - ELK Metricbeat 导致 RKE 节点崩溃

Question

我在 MS Azure 上创建了一个 RKE，RKE 集群是在Azure上使用 3 个虚拟机 1 个主虚拟机和 2 个工作线程创建的。我将其用作使用 Helm Charts部署ELK的 PoC 环境。

helm ls -A
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
elasticsearch   default         1               2022-01-20 15:49:49.694361821 +0000 UTC deployed        elasticsearch-7.16.2    7.16.2     
filebeat        default         1               2022-01-20 15:55:24.357918192 +0000 UTC deployed        filebeat-7.16.2         7.16.2     
kibana          default         1               2022-01-20 16:23:35.669614848 +0000 UTC deployed        kibana-7.16.2           7.16.2     
logstash        default         1               2022-01-20 15:52:04.003006413 +0000 UTC deployed        logstash-7.16.2         7.16.2     
metricbeat      default         1               2022-01-20 16:04:58.674983026 +0000 UTC deployed        metricbeat-7.16.2       7.16.2

奇怪的是，当我部署metricbeat并且节点状态变为NotReady时，其中一个集群节点（比如节点 2）崩溃了。我尝试重新创建节点。其他节点开始崩溃。

NAME                         STATUS     ROLES               AGE     VERSION
devops-sandbox-rke-master1   Ready      controlplane,etcd   8d      v1.21.8
devops-sandbox-rke-worker1   Ready      worker              7d15h   v1.21.8
devops-sandbox-rke-worker2   NotReady   worker              8d      v1.21.8

我使用RKE 本地路径解决方案进行动态存储配置。我尝试在默认命名空间和kube-system命名空间中进行部署

所有机器都是从我创建的图像创建的。此映像具有集群节点的通用组件（Docker ...）
我将 **hostname_override ** 设置为与 RKE配置文件中的地址相同cluster.yml
我尝试在kube-system和default命名空间上进行部署。
RKE 版本 v1.3.4 和 kuberentes 版本 v1.21.8 (v1.21.8-rancher1-1)

现在我在 metricbeat 容器上看到以下错误，并且集群上的情况不稳定。

$ kubectl get pods -owide
NAME                                             READY   STATUS             RESTARTS   AGE   IP            NODE                         NOMINATED NODE   READINESS GATES
elasticsearch-master-0                           0/1     Running            0          14h   10.42.3.32    devops-sandbox-rke-worker1   <none>           <none>
elasticsearch-master-1                           1/1     Terminating        0          19h   10.42.1.102   devops-sandbox-rke-worker2   <none>           <none>
kibana-kibana-6678489c4f-nq4zt                   0/1     Running            0          14h   10.42.3.33    devops-sandbox-rke-worker1   <none>           <none>
logstash-logstash-0                              1/1     Terminating        4          19h   10.42.1.103   devops-sandbox-rke-worker2   <none>           <none>
metricbeat-kube-state-metrics-75c5fc65d9-26rvb   1/1     Running            0          14h   10.42.3.36    devops-sandbox-rke-worker1   <none>           <none>
metricbeat-kube-state-metrics-75c5fc65d9-s6qmv   1/1     Terminating        0          14h   10.42.1.107   devops-sandbox-rke-worker2   <none>           <none>
metricbeat-metricbeat-c4dh5                      0/1     CrashLoopBackOff   64         18h   10.42.1.105   devops-sandbox-rke-worker2   <none>           <none>
metricbeat-metricbeat-drkj6                      0/1     Running            158        18h   10.42.3.27    devops-sandbox-rke-worker1   <none>           <none>
metricbeat-metricbeat-metrics-cb95c45fb-5nc5b    0/1     Terminating        0          14h   10.42.1.106   devops-sandbox-rke-worker2   <none>           <none>
metricbeat-metricbeat-metrics-cb95c45fb-bhwgq    0/1     Running            129        14h   10.42.3.35    devops-sandbox-rke-worker1   <none>           <none>

如果我尝试从故障节点（worker2）上的 pod 获取日志，我现在会收到此错误，指出 worker1 上的 pod 日志已成功检索

$kubectl logs filebeat-filebeat-ns64p
Error from server: Get "https://10.1.0.6:10250/containerLogs/default/filebeat-filebeat-ns64p/filebeat": net/http: TLS handshake timeout

注意描述 pod 或 statefulset 不会显示任何事件

从 azure 门户重新启动节点将恢复工作节点并启用对 pod 的可访问性，但不会解决主要错误

$ kubectl get pods -owide | grep -v file
NAME                                             READY   STATUS             RESTARTS   AGE     IP            NODE                         NOMINATED NODE   READINESS GATES
elasticsearch-master-0                           1/1     Running            0          6m15s   10.42.3.37    devops-sandbox-rke-worker1   <none>           <none>
elasticsearch-master-1                           1/1     Running            0          7m18s   10.42.1.119   devops-sandbox-rke-worker2   <none>           <none>
kibana-kibana-6678489c4f-nq4zt                   1/1     Running            0          15h     10.42.3.33    devops-sandbox-rke-worker1   <none>           <none>
logstash-logstash-0                              1/1     Running            0          17m     10.42.1.117   devops-sandbox-rke-worker2   <none>           <none>
metricbeat-kube-state-metrics-75c5fc65d9-26rvb   1/1     Running            0          15h     10.42.3.36    devops-sandbox-rke-worker1   <none>           <none>
metricbeat-metricbeat-c4dh5                      0/1     CrashLoopBackOff   71         20h     10.42.1.111   devops-sandbox-rke-worker2   <none>           <none>
metricbeat-metricbeat-drkj6                      0/1     Running            174        20h     10.42.3.27    devops-sandbox-rke-worker1   <none>           <none>
metricbeat-metricbeat-metrics-cb95c45fb-bhwgq    0/1     Running            143        15h     10.42.3.35    devops-sandbox-rke-worker1   <none>           <none>

我从 metricbeat 日志中发现了以下错误（以下是精炼版本），此时 worker2 节点状态仍在运行

2022-01-20T16:33:52.516Z        WARN    [cfgwarn]       memory/memory.go:60     DEPRECATED: linux-only memory stats, such as hugepages, and page_stats, will be moved to the linux module Will be removed in version: 8.0
2022-01-20T16:33:55.888Z        WARN    [tls]   tlscommon/tls_config.go:98      SSL/TLS verifications disabled.
2022-01-20T16:50:37.834Z        ERROR   [logstash]      logstash/async.go:280   Failed to publish events caused by: write tcp 10.42.3.27:58804->10.43.133.117:5044: write: connection reset by peer
2022-01-20T16:50:39.070Z        ERROR   [publisher_pipeline_output]     pipeline/output.go:180  failed to publish events: write tcp 10.42.3.27:58804->10.43.133.117:5044: write: connection reset by peer

抱歉，如果信息太多无法消化，但这是我必须处理的。

但是在这里，我必须要寻求帮助的要点是

RKE节点故障
- 如果没有直接的答案，什么是正确的故障排除方法
Metricbeat 失败
- 这可能是因为我将 **hostname_override ** 设置为与RKE配置文件中的地址相同的值。cluster.yaml

相同的 helm 图表用于在 GKE 上部署它，并且运行良好。

下面是我使用的配置

RKE 集群.yaml

$ cat cluster.yml
nodes:
- address: devops-sandbox-rke-master1
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  hostname_override: devops-sandbox-rke-master1
  user: rke
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: devops-sandbox-rke-worker1
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: devops-sandbox-rke-worker1
  user: rke
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: devops-sandbox-rke-worker2
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: devops-sandbox-rke-worker2
  user: rke
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    uid: 0
    gid: 0
    snapshot: null
    retention: ""
    creation: ""
    backup_config: null
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    service_cluster_ip_range: 10.43.0.0/16
    service_node_port_range: ""
    pod_security_policy: false
    always_pull_images: false
    secrets_encryption_config: null
    audit_log: null
    admission_configuration: null
    event_rate_limit: null
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
  kubelet:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    cluster_domain: cluster.local
    infra_container_image: ""
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
    generate_serving_certificate: false
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
network:
  plugin: canal
  options: {}
  mtu: 0
  node_selector: {}
  update_strategy: null
  tolerations: []
authentication:
  strategy: x509
  sans: []
  webhook: null
addons: ""
addons_include: []
system_images:
  etcd: rancher/mirrored-coreos-etcd:v3.4.16-rancher1
  alpine: rancher/rke-tools:v0.1.78
  nginx_proxy: rancher/rke-tools:v0.1.78
  cert_downloader: rancher/rke-tools:v0.1.78
  kubernetes_services_sidecar: rancher/rke-tools:v0.1.78
  kubedns: rancher/mirrored-k8s-dns-kube-dns:1.17.4
  dnsmasq: rancher/mirrored-k8s-dns-dnsmasq-nanny:1.17.4
  kubedns_sidecar: rancher/mirrored-k8s-dns-sidecar:1.17.4
  kubedns_autoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.3
  coredns: rancher/mirrored-coredns-coredns:1.8.4
  coredns_autoscaler: rancher/mirrored-cluster-proportional-autoscaler:1.8.3
  nodelocal: rancher/mirrored-k8s-dns-node-cache:1.18.0
  kubernetes: rancher/hyperkube:v1.21.8-rancher1
  flannel: rancher/mirrored-coreos-flannel:v0.15.1
  flannel_cni: rancher/flannel-cni:v0.3.0-rancher6
  calico_node: rancher/mirrored-calico-node:v3.19.2
  calico_cni: rancher/mirrored-calico-cni:v3.19.2
  calico_controllers: rancher/mirrored-calico-kube-controllers:v3.19.2
  calico_ctl: rancher/mirrored-calico-ctl:v3.19.2
  calico_flexvol: rancher/mirrored-calico-pod2daemon-flexvol:v3.19.2
  canal_node: rancher/mirrored-calico-node:v3.19.2
  canal_cni: rancher/mirrored-calico-cni:v3.19.2
  canal_controllers: rancher/mirrored-calico-kube-controllers:v3.19.2
  canal_flannel: rancher/mirrored-coreos-flannel:v0.15.1
  canal_flexvol: rancher/mirrored-calico-pod2daemon-flexvol:v3.19.2
  weave_node: weaveworks/weave-kube:2.8.1
  weave_cni: weaveworks/weave-npc:2.8.1
  pod_infra_container: rancher/mirrored-pause:3.4.1
  ingress: rancher/nginx-ingress-controller:nginx-0.49.3-rancher1
  ingress_backend: rancher/mirrored-nginx-ingress-controller-defaultbackend:1.5-rancher1
  ingress_webhook: rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
  metrics_server: rancher/mirrored-metrics-server:v0.5.0
  windows_pod_infra_container: rancher/kubelet-pause:v0.1.6
  aci_cni_deploy_container: noiro/cnideploy:5.1.1.0.1ae238a
  aci_host_container: noiro/aci-containers-host:5.1.1.0.1ae238a
  aci_opflex_container: noiro/opflex:5.1.1.0.1ae238a
  aci_mcast_container: noiro/opflex:5.1.1.0.1ae238a
  aci_ovs_container: noiro/openvswitch:5.1.1.0.1ae238a
  aci_controller_container: noiro/aci-containers-controller:5.1.1.0.1ae238a
  aci_gbp_server_container: noiro/gbp-server:5.1.1.0.1ae238a
  aci_opflex_server_container: noiro/opflex-server:5.1.1.0.1ae238a
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
ignore_docker_version: null
enable_cri_dockerd: null
kubernetes_version: ""
private_registries: []
ingress:
  provider: ""
  options: {}
  node_selector: {}
  extra_args: {}
  dns_policy: ""
  extra_envs: []
  extra_volumes: []
  extra_volume_mounts: []
  update_strategy: null
  http_port: 0
  https_port: 0
  network_mode: ""
  tolerations: []
  default_backend: null
  default_http_backend_priority_class_name: ""
  nginx_ingress_controller_priority_class_name: ""
  default_ingress_class: null
cluster_name: ""
cloud_provider:
  name: ""
prefix_path: ""
win_prefix_path: ""
addon_job_timeout: 0
bastion_host:
  address: ""
  port: ""
  user: ""
  ssh_key: ""
  ssh_key_path: ""
  ssh_cert: ""
  ssh_cert_path: ""
  ignore_proxy_env_vars: false
monitoring:
  provider: ""
  options: {}
  node_selector: {}
  update_strategy: null
  replicas: null
  tolerations: []
  metrics_server_priority_class_name: ""
restore:
  restore: false
  snapshot_name: ""
rotate_encryption_key: false
dns: null

Metricbeat 舵图 values.yaml

---
daemonset:
  
  annotations: {}
  
  labels: {}
  affinity: {}
  
  enabled: true
  
  envFrom: []
  
  
  extraEnvs: []
  
  
  extraVolumes: []
  
  
  extraVolumeMounts: []
  
  
  
  hostAliases: []
  
  
  
  hostNetworking: false
  
  
  metricbeatConfig:
    metricbeat.yml: |
      metricbeat.modules:
      - module: kubernetes
        metricsets:
          - container
          - node
          - pod
          - system
          - volume
        period: 10s
        host: "${NODE_NAME}"
        hosts: ["https://${NODE_NAME}:10250"]
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        ssl.verification_mode: "none"
        processors:
        - add_kubernetes_metadata: ~
      - module: kubernetes
        enabled: true
        metricsets:
          - event
      - module: system
        period: 10s
        metricsets:
          - cpu
          - load
          - memory
          - network
          - process
          - process_summary
        processes: ['.*']
        process.include_top_n:
          by_cpu: 5
          by_memory: 5
      - module: system
        period: 1m
        metricsets:
          - filesystem
          - fsstat
        processors:
        - drop_event.when.regexp:
            system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib)($|/)'
      output.logstash:
        hosts: '${LOGSTASH_HOSTS:logstash-logstash:5044}'
  nodeSelector: {}
  secretMounts: []
  securityContext:
    runAsUser: 0
    privileged: false
  resources:
    requests:
      cpu: "100m"
      memory: "100Mi"
    limits:
      cpu: "1000m"
      memory: "200Mi"
  tolerations: []

deployment:
  
  annotations: {}
  
  labels: {}
  affinity: {}
  
  enabled: true
  
  envFrom: []
  
  
  extraEnvs: []
  
  extraVolumes: []
  
  
  extraVolumeMounts: []

  hostAliases: []

  metricbeatConfig:
    metricbeat.yml: |
      metricbeat.modules:
      - module: kubernetes
        enabled: true
        metricsets:
          - state_node
          - state_deployment
          - state_replicaset
          - state_pod
          - state_container
        period: 10s
        hosts: ["${KUBE_STATE_METRICS_HOSTS}"]
      output.elasticsearch:
        hosts: '${ELASTICSEARCH_HOSTS:elasticsearch-master:9200}'
  nodeSelector: {}
  
  
  secretMounts: []
  
  
  
  securityContext:
    runAsUser: 0
    privileged: false
  resources:
    requests:
      cpu: "100m"
      memory: "100Mi"
    limits:
      cpu: "1000m"
      memory: "200Mi"
  tolerations: []


replicas: 1

extraContainers: ""

extraInitContainers: ""

hostPathRoot: /var/lib

image: "docker.elastic.co/beats/metricbeat"
imageTag: "7.16.2"
imagePullPolicy: "IfNotPresent"
imagePullSecrets: []

livenessProbe:
  exec:
    command:
      - sh
      - -c
      - |
        #!/usr/bin/env bash -e
        curl --fail 127.0.0.1:5066
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5

readinessProbe:
  exec:
    command:
      - sh
      - -c
      - |
        #!/usr/bin/env bash -e
        metricbeat test output
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5


managedServiceAccount: true

clusterRoleRules:
  - apiGroups: [""]
    resources:
      - nodes
      - namespaces
      - events
      - pods
    verbs: ["get", "list", "watch"]
  - apiGroups: ["extensions"]
    resources:
      - replicasets
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources:
      - statefulsets
      - deployments
      - replicasets
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources:
      - nodes/stats
    verbs: ["get"]

podAnnotations:
  {}

serviceAccount: ""


serviceAccountAnnotations:
  {}

terminationGracePeriod: 30

priorityClassName: ""

updateStrategy: RollingUpdate



nameOverride: ""
fullnameOverride: ""

kube_state_metrics:
  enabled: true
  
  host: ""


secrets: []



affinity: {}
envFrom: []
extraEnvs: []
extraVolumes: []
extraVolumeMounts: []


metricbeatConfig: {}
nodeSelector: {}
podSecurityContext: {}
resources: {}
secretMounts: []
tolerations: []
labels: {}

Elasticsearch helm 图表 values.yaml

---
clusterName: "elasticsearch"
nodeGroup: "master"

masterService: ""

roles:
  master: "true"
  ingest: "true"
  data: "true"
  remote_cluster_client: "true"
  ml: "true"

replicas: 3
minimumMasterNodes: 2

esMajorVersion: ""

clusterDeprecationIndexing: "false"

esConfig: {}

extraEnvs: []

envFrom: []

secretMounts: []


hostAliases: []

image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.16.2"
imagePullPolicy: "IfNotPresent"

podAnnotations:
  {}

labels: {}

esJavaOpts: "" 

resources:
  requests:
    cpu: "500m"
    memory: "500Mi"
  limits:
    cpu: "1000m"
    memory: "2Gi"

initResources:
  {}

networkHost: "0.0.0.0"

volumeClaimTemplate:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 30Gi

rbac:
  create: false
  serviceAccountAnnotations: {}
  serviceAccountName: ""
  automountToken: true

podSecurityPolicy:
  create: false
  name: ""
  spec:
    privileged: true
    fsGroup:
      rule: RunAsAny
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
      - secret
      - configMap
      - persistentVolumeClaim
      - emptyDir

persistence:
  enabled: true
  labels:
    
    enabled: false
  annotations: {}

extraVolumes:
  []

extraVolumeMounts:
  []
 
extraContainers:
  []
 
extraInitContainers:
  []

priorityClassName: ""
antiAffinityTopologyKey: "kubernetes.io/hostname"

antiAffinity: "hard"

nodeAffinity: {}
podManagementPolicy: "Parallel"

enableServiceLinks: true

protocol: http
httpPort: 9200
transportPort: 9300

service:
  enabled: true
  labels: {}
  labelsHeadless: {}
  type: ClusterIP
  nodePort: ""
  annotations: {}
  httpPortName: http
  transportPortName: transport
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  externalTrafficPolicy: ""

updateStrategy: RollingUpdate

maxUnavailable: 1

podSecurityContext:
  fsGroup: 1000
  runAsUser: 1000

securityContext:
  capabilities:
    drop:
      - ALL
  
  runAsNonRoot: true
  runAsUser: 1000


terminationGracePeriod: 120

sysctlVmMaxMapCount: 262144

readinessProbe:
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 3
  timeoutSeconds: 5


clusterHealthCheckParams: "wait_for_status=green&timeout=1s"

schedulerName: ""

imagePullSecrets: []
nodeSelector: {}
tolerations: []



ingress:
  enabled: false
  annotations: {}
  
  
  className: "nginx"
  pathtype: ImplementationSpecific
  hosts:
    - host: chart-example.local
      paths:
        - path: /
  tls: []

nameOverride: ""
fullnameOverride: ""
healthNameOverride: ""

lifecycle:
  {}

sysctlInitContainer:
  enabled: true

keystore: []

networkPolicy:

  http:
    enabled: false

  transport:
    enabled: false
 
tests:
  enabled: true



fsGroup: ""

Logstash helm 图表 values.yaml

---
replicas: 1

logstashConfig: {}

logstashPipeline:
 logstash.conf: |
 
    input {
      beats {
        port => 5044
      }
    }

    output {
      elasticsearch {
        hosts => ["http://elasticsearch-master:9200"]
        index => "%{[@metadata][beat]}-%{[@metadata][version]}"
      }
    }

logstashPatternDir: "/usr/share/logstash/patterns/"
logstashPattern: {}

extraEnvs: []

envFrom: []

secrets: []

secretMounts: []

hostAliases: []

image: "docker.elastic.co/logstash/logstash"
imageTag: "7.16.2"
imagePullPolicy: "IfNotPresent"
imagePullSecrets: []

podAnnotations: {}


labels: {}

logstashJavaOpts: "-Xmx1g -Xms1g"

resources:
  requests:
    cpu: "100m"
    memory: "1536Mi"
  limits:
    cpu: "1000m"
    memory: "1536Mi"

volumeClaimTemplate:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 1Gi

rbac:
  create: false
  serviceAccountAnnotations: {}
  serviceAccountName: ""
  annotations:
    {}

podSecurityPolicy:
  create: false
  name: ""
  spec:
    privileged: false
    fsGroup:
      rule: RunAsAny
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
      - secret
      - configMap
      - persistentVolumeClaim

persistence:
  enabled: false
  annotations: {}

extraVolumes:
  ""

extraVolumeMounts:
  ""
extraContainers:
  ""

extraInitContainers:
  ""
priorityClassName: ""

antiAffinityTopologyKey: "kubernetes.io/hostname"

antiAffinity: "hard"

nodeAffinity: {}

podAffinity: {}

podManagementPolicy: "Parallel"

httpPort: 9600


extraPorts:
  []
  
updateStrategy: RollingUpdate

maxUnavailable: 1

podSecurityContext:
  fsGroup: 1000
  runAsUser: 1000

securityContext:
  capabilities:
    drop:
      - ALL
  
  runAsNonRoot: true
  runAsUser: 1000

terminationGracePeriod: 120


livenessProbe:
  httpGet:
    path: /
    port: http
  initialDelaySeconds: 300
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1

readinessProbe:
  httpGet:
    path: /
    port: http
  initialDelaySeconds: 60
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 3

schedulerName: ""

nodeSelector: {}
tolerations: []

nameOverride: ""
fullnameOverride: ""

lifecycle:
  {}


service: 
 annotations: {}
 type: ClusterIP
 loadBalancerIP: ""
 ports:
   - name: beats
     port: 5044
     protocol: TCP
     targetPort: 5044
   - name: http
     port: 8080
     protocol: TCP
     targetPort: 8080

ingress:
  enabled: false
  className: "nginx"
  pathtype: ImplementationSpecific
  hosts:
    - host: logstash-example.local
      paths:
        - path: /
  tls: []

Filebeat helm 图表 values.yaml

---
daemonset:
  
  annotations: {}
  
  labels: {}
  affinity: {}
  
  enabled: true
  
  envFrom: []
  
  
  extraEnvs: []
  
  
  extraVolumes:
    []

  extraVolumeMounts:
    []

  hostNetworking: false

  filebeatConfig:
    filebeat.yml: |
      filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log
        processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            matchers:
            - logs_path:
                logs_path: "/var/log/containers/"

      output.logstash:
        hosts: ["logstash-logstash:5044"]
  
  maxUnavailable: 1
  nodeSelector: {}
  
  
  secretMounts: []
  
  securityContext:
    runAsUser: 0
    privileged: false
  resources:
    requests:
      cpu: "100m"
      memory: "100Mi"
    limits:
      cpu: "1000m"
      memory: "200Mi"
  tolerations: []

deployment:
  
  annotations: {}
  
  labels: {}
  affinity: {}
  
  enabled: false
  
  envFrom: []

  extraEnvs: []

  extraVolumes: []

  extraVolumeMounts: []

  filebeatConfig:
    filebeat.yml: |
      filebeat.inputs:
      - type: tcp
        max_message_size: 10MiB
        host: "localhost:9000"

      output.elasticsearch:
        host: '${NODE_NAME}'
        hosts: '${ELASTICSEARCH_HOSTS:elasticsearch-master:9200}'
  nodeSelector: {}
  
  
  secretMounts: []

  securityContext:
    runAsUser: 0
    privileged: false
  resources:
    requests:
      cpu: "100m"
      memory: "100Mi"
    limits:
      cpu: "1000m"
      memory: "200Mi"
  tolerations: []


replicas: 1

extraContainers: ""

extraInitContainers: []

hostPathRoot: /var/lib

dnsConfig: {}

hostAliases: []

image: "docker.elastic.co/beats/filebeat"
imageTag: "7.16.2"
imagePullPolicy: "IfNotPresent"
imagePullSecrets: []

livenessProbe:
  exec:
    command:
      - sh
      - -c
      - |
        
        curl --fail 127.0.0.1:5066
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5

readinessProbe:
  exec:
    command:
      - sh
      - -c
      - |
        
        filebeat test output
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5


managedServiceAccount: true

clusterRoleRules:
  - apiGroups:
      - ""
    resources:
      - namespaces
      - nodes
      - pods
    verbs:
      - get
      - list
      - watch

podAnnotations:
  {}
  
serviceAccount: ""


serviceAccountAnnotations:
  {}

terminationGracePeriod: 30


priorityClassName: ""

updateStrategy: RollingUpdate

nameOverride: ""
fullnameOverride: ""


affinity: {}
envFrom: []
extraEnvs: []
extraVolumes: []
extraVolumeMounts: []

filebeatConfig: {}
nodeSelector: {}
podSecurityContext: {}
resources: {}
secretMounts: []
tolerations: []
labels: {}

谢谢！

elasticsearch - ELK Metricbeat 导致 RKE 节点崩溃

0 回答 0

Related

Reference