memory - 我的机器有足够的内存，但是kubernetes无法调度pod并提示内存不足

Question

我有一个 1.16.2 版本的 Kubernetes 集群。当我在副本为 1 的集群中部署所有服务时，它工作正常。然后我将所有服务的副本扩展到 2 并签出。发现某些服务正在运行正常，但某些状态正在等待。当我 kubectl 描述其中一个待处理的 pod 时，我收到如下消息

[root@runsdata-bj-01 society-training-service-v1-0]# kcd society-resident-service-v3-0-788446c49b-rzjsx
Name:           society-resident-service-v3-0-788446c49b-rzjsx
Namespace:      runsdata
Priority:       0
Node:           <none>
Labels:         app=society-resident-service-v3-0
                pod-template-hash=788446c49b
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/society-resident-service-v3-0-788446c49b
Containers:
  society-resident-service-v3-0:
    Image:      docker.ssiid.com/society-resident-service:3.0.33
    Port:       8231/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     1
      memory:  4Gi
    Requests:
      cpu:      200m
      memory:   2Gi
    Liveness:   http-get http://:8231/actuator/health delay=600s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8231/actuator/health delay=30s timeout=5s period=10s #success=1 #failure=3
    Environment:
      spring_profiles_active:  production
      TZ:                      Asia/Hong_Kong
      JAVA_OPTS:               -Djgroups.use.jdk_logger=true -Xmx4000M  -Xms4000M  -Xmn600M  -XX:PermSize=500M  -XX:MaxPermSize=500M  -Xss384K  -XX:+DisableExplicitGC  -XX:SurvivorRatio=1 -XX:+UseConcMarkSweepGC  -XX:+UseParNewGC  -XX:+CMSParallelRemarkEnabled  -XX:+UseCMSCompactAtFullCollection  -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled  -XX:LargePageSizeInBytes=128M  -XX:+UseFastAccessorMethods  -XX:+UseCMSInitiatingOccupancyOnly  -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+PrintClassHistogram  -XX:+PrintGCDetails  -XX:+PrintGCTimeStamps  -XX:+PrintHeapAtGC  -Xloggc:log/gc.log
    Mounts:
      /data/storage from nfs-data-storage (rw)
      /opt/security from security (rw)
      /var/log/runsdata from log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from application-token-vgcvb (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  log:
    Type:          HostPath (bare host directory volume)
    Path:          /log/runsdata
    HostPathType:  
  security:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-security-claim
    ReadOnly:   false
  nfs-data-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-storage-claim
    ReadOnly:   false
  application-token-vgcvb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  application-token-vgcvb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/4 nodes are available: 4 Insufficient memory.

从下面可以看到我的机器还有2G多内存。

[root@runsdata-bj-01 society-training-service-v1-0]# kcp |grep Pending
society-insurance-foundation-service-v2-0-7697b9bd5b-7btq6      0/1     Pending            0          60m
society-notice-service-v1-0-548b8d5946-c5gzm                    0/1     Pending            0          60m
society-online-business-service-v2-1-7f897f564-phqjs            0/1     Pending            0          60m
society-operation-gateway-7cf86b77bd-lmswm                      0/1     Pending            0          60m
society-operation-user-service-v1-1-755dcff964-dr9mj            0/1     Pending            0          60m
society-resident-service-v3-0-788446c49b-rzjsx                  0/1     Pending            0          60m
society-training-service-v1-0-774f8c5d98-tl7vq                  0/1     Pending            0          60m
society-user-service-v3-0-74865dd9d7-t9fwz                      0/1     Pending            0          60m
traefik-ingress-controller-8688cccf79-5gkjg                     0/1     Pending            0          60m
[root@runsdata-bj-01 society-training-service-v1-0]# kubectl top nodes
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
192.168.0.94   384m         9%     11482Mi         73%       
192.168.0.95   399m         9%     11833Mi         76%       
192.168.0.96   399m         9%     11023Mi         71%       
192.168.0.97   457m         11%    10782Mi         69%       
[root@runsdata-bj-01 society-training-service-v1-0]# kubectl get nodes
NAME           STATUS   ROLES    AGE   VERSION
192.168.0.94   Ready    <none>   8d    v1.16.2
192.168.0.95   Ready    <none>   8d    v1.16.2
192.168.0.96   Ready    <none>   8d    v1.16.2
192.168.0.97   Ready    <none>   8d    v1.16.2
[root@runsdata-bj-01 society-training-service-v1-0]#

这是所有4个节点的描述

[root@runsdata-bj-01 frontend]#kubectl describe node 192.168.0.94
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1930m (48%)   7600m (190%)
  memory             9846Mi (63%)  32901376Ki (207%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:              <none>
[root@runsdata-bj-01 frontend]#kubectl describe node 192.168.0.95
    Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1670m (41%)   6600m (165%)
  memory             7196Mi (46%)  21380Mi (137%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:              <none>
[root@runsdata-bj-01 frontend]# kubectl describe node 192.168.0.96
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                2610m (65%)   7 (175%)
  memory             9612Mi (61%)  19960Mi (128%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:              <none>
[root@runsdata-bj-01 frontend]# kubectl describe node 192.168.0.97  
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                2250m (56%)    508200m (12705%)
  memory             10940Mi (70%)  28092672Ki (176%)
  ephemeral-storage  0 (0%)         0 (0%)
Events:              <none>

以及所有4个节点的内存：

[root@runsdata-bj-00 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        2.8G        6.7G        2.1M        5.7G         11G
Swap:            0B          0B          0B
[root@runsdata-bj-01 frontend]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        7.9G        3.7G        2.4M        3.6G        6.8G
Swap:            0B          0B          0B
[root@runsdata-bj-02 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        5.0G        2.9G        3.9M        7.4G        9.5G
Swap:            0B          0B          0B
[root@runsdata-bj-03 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        6.5G        2.2G        2.3M        6.6G        8.2G
Swap:            0B          0B          0B

这是 kube-scheduler 日志：

[root@runsdata-bj-01 log]# cat messages|tail -n 5000|grep kube-scheduler
Apr 17 14:31:24 runsdata-bj-01 kube-scheduler: E0417 14:31:24.404442   12740 factory.go:585] pod is already present in the activeQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.490310   12740 factory.go:585] pod is already present in the backoffQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.873292   12740 factory.go:585] pod is already present in the backoffQ
Apr 18 21:44:18 runsdata-bj-01 etcd: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " with result "range_response_count:1 size:440" took too long (100.521269ms) to execute
Apr 18 21:59:40 runsdata-bj-01 kube-scheduler: E0418 21:59:40.050852   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.069465   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.950254   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:08 runsdata-bj-01 kube-scheduler: E0418 22:03:08.567290   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.152812   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.344902   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:04:32 runsdata-bj-01 kube-scheduler: E0418 22:04:32.969606   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:09:51 runsdata-bj-01 kube-scheduler: E0418 22:09:51.366877   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.430976   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.441182   12740 factory.go:585] pod is already present in the activeQ

我搜索了谷歌和stackoverflow，找不到解决方案。谁能帮我？

score 0 · Accepted Answer

Kubernetes 保留节点稳定性而不是资源配置，可用内存不是基于free -m命令计算的，如文档所述：

的值memory.available来自 cgroupfs 而不是free -m. 这一点很重要，因为free -m在容器中不起作用，并且如果用户使用节点可分配特性，资源外决策将在 cgroup 层次结构的最终用户 Pod 部分以及根节点本地做出。该脚本重现了kubelet执行计算的相同步骤集memory.available。从它的kubelet计算中排除 inactive_file（即在非活动 LRU 列表上的文件支持内存的字节数），因为它假设内存在压力下是可回收的。

您可以使用上面提到的脚本来检查节点中可用的内存，如果没有可用资源，则需要增加集群大小以添加新节点。

此外，您可以查看文档页面以获取有关资源限制的更多信息

memory - 我的机器有足够的内存，但是kubernetes无法调度pod并提示内存不足

1 回答 1

Related

Reference