0

我正在尝试在我的 AWS EKS K8s 集群中部署 Prometheus nodeexporter Daemonset。

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: prometheus
    chart: prometheus-11.12.1
    component: node-exporter
    heritage: Helm
    release: prometheus
  name: prometheus-node-exporter
  namespace: operations-tools-test
spec:
  selector:
    matchLabels:
      app: prometheus
      component: node-exporter
      release: prometheus
  template:
    metadata:
      labels:
        app: prometheus
        chart: prometheus-11.12.1
        component: node-exporter
        heritage: Helm
        release: prometheus
    spec:
      containers:
      - args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --web.listen-address=:9100
        image: prom/node-exporter:v1.0.1
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: metrics
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host/proc
          name: proc
          readOnly: true
        - mountPath: /host/sys
          name: sys
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
      hostPID: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: prometheus-node-exporter
      serviceAccountName: prometheus-node-exporter
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /proc
          type: ""
        name: proc
      - hostPath:
          path: /sys
          type: ""
        name: sys

然而,在部署之后,它并没有被部署在一个节点上。

该文件的 pod.yml 文件如下所示:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  generateName: prometheus-node-exporter-
  labels:
    app: prometheus
    chart: prometheus-11.12.1
    component: node-exporter
    heritage: Helm
    pod-template-generation: "1"
    release: prometheus
  name: prometheus-node-exporter-xxxxx
  namespace: operations-tools-test
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: prometheus-node-exporter
  resourceVersion: "51496903"
  selfLink: /api/v1/namespaces/namespace-x/pods/prometheus-node-exporter-xxxxx
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - ip-xxx-xx-xxx-xxx.ec2.internal
  containers:
  - args:
    - --path.procfs=/host/proc
    - --path.sysfs=/host/sys
    - --web.listen-address=:9100
    image: prom/node-exporter:v1.0.1
    imagePullPolicy: IfNotPresent
    name: prometheus-node-exporter
    ports:
    - containerPort: 9100
      hostPort: 9100
      name: metrics
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /host/proc
      name: proc
      readOnly: true
    - mountPath: /host/sys
      name: sys
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: prometheus-node-exporter-token-xxxx
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  hostPID: true
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: prometheus-node-exporter
  serviceAccountName: prometheus-node-exporter
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /proc
      type: ""
    name: proc
  - hostPath:
      path: /sys
      type: ""
    name: sys
  - name: prometheus-node-exporter-token-xxxxx
    secret:
      defaultMode: 420
      secretName: prometheus-node-exporter-token-xxxxx
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-11-06T23:56:47Z"
    message: '0/4 nodes are available: 2 node(s) didn''t have free ports for the requested
      pod ports, 3 Insufficient pods, 3 node(s) didn''t match node selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

如上所示,POD nodeAffinity 查找 metadata.name ,它与我在节点中作为标签所拥有的完全匹配。

但是当我运行以下命令时,

 kubectl describe  po prometheus-node-exporter-xxxxx

我参加了以下活动:

Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  60m                   default-scheduler  0/4 nodes are available: 1 Insufficient pods, 3 node(s) didn't match node selector.
  Warning  FailedScheduling  4m46s (x37 over 58m)  default-scheduler  0/4 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 Insufficient pods, 3 node(s) didn't match node selector.

我还检查了调度程序的 Cloud-watch 日志,但我没有看到失败 pod 的任何日志。

节点剩余资源充足

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests    Limits
  --------                    --------    ------
  cpu                         520m (26%)  210m (10%)
  memory                      386Mi (4%)  486Mi (6%)

我看不出它不应该安排一个 pod 的原因。谁能帮我这个?

TIA

4

1 回答 1

1

正如评论中所发布的:

请在问题中添加您遵循的步骤(编辑 Helm 图表中的任何值等)。另外请检查节点是否超过可以在其上调度的 pod 的限制。在这里您可以找到更多参考链接:LINK

给定节点上没有进程占用 9100。@DawidKruk 已达到 POD 限制。谢谢!我希望他们给我一些错误,而不是模糊的节点选择器属性不匹配


不太确定为什么会显示以下消息:

  • 节点没有用于请求的 pod 端口的空闲端口
  • 节点与节点选择器不匹配

Pods无法在节点(Pending状态)上调度的问题与命令Insufficient pods中的消息有关$ kubectl get events

当节点达到其最大 Pod 容量时,将显示上述消息(例如:node1可以调度最大30Pod)。


更多信息Insufficient Pods可以在这个 github 问题评论中找到:

确实如此。那是因为 EKS 上的 CNI 实现。最大 pod 数量受连接到实例的网络接口乘以每个 ENI 的 ip 数量的限制 - 这取决于实例的大小。很明显,对于小型实例,这个数字可能非常低。

Docs.aws.amazon.com:AWSEC2:用户指南:使用 ENI:每个 ENI 的可用 IP

-- Github.com:Kubernetes:Autoscaler:第 1576 期:评论 454100551


其他资源:

于 2020-11-10T12:03:32.440 回答