linux - 100% 内核 CPU 终止与服务器的连接

Question

我的服务器运行 Centos 6.9 (64gb Ram) 和 nginx，问题是每 10 分钟在 htop 中有随机的 100% 内核 cpu 峰值，由“events/10”和“ksoftirqd/10”生成。我不知道如何找出产生这个问题的确切过程。

这是我的/proc/interrupts

$ cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14      CPU15
   0:      77897          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      timer
   1:          2          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      i8042
   8:          1          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
   9:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
  12:          4          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC-edge      i8042
  56:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      aerdrv
  63:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
  64:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
  65:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
  66: 1426061273          0          0          0          0          0          0          0          0          0          0    1914508          0          0          0          0   PCI-MSI-edge      ahci
  67:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      ahci
  68: 3084636512          0          0          0          0          0          0          0          0          0   10149560          0          0          0          0          0   PCI-MSI-edge      eth0
 NMI:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Non-maskable interrupts
 LOC: 1972128636  528409367 3519065090 2616991376 2762882221 3577269786 2407615998 2889069038 1939478243 2270996522 1940319131 2244314760 2033706214 2339089941 2303043400 2629954396   Local timer interrupts
 SPU:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Performance monitoring interrupts
 IWI:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IRQ work interrupts
 RES: 1349612979 1979915818 1044674069  463586597 1410841781  641984863 3396971132 3062175502 2189512469 2034852778 1686264346 1571882114 1410891335 1330892006 1273321645 1195645068   Rescheduling interrupts
 CAL:    1771384    1771300    1775694    1780259    1778017    1782331    1761855    1755630    1758801    1759472    1770034    1773352    1775468    1779579    1778401    1778652   Function call interrupts
 TLB: 1295395722  623515438  528231713  457109575  438669843  412327240  413878597  392015004  399091958  373918339  391267007  362582716  383312220  348908971  376811042  337426419   TLB shootdowns
 TRM:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Machine check exceptions
 MCP:      77730      77730      77730      77730      77730      77730      77730      77730      77730      77730      77730      77730      77730      77730      77730      77730   Machine check polls
 ERR:          0
 MIS:          0

这是我的/proc/cpuinfo

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 1
model name      : AMD Ryzen 7 PRO 1700X Eight-Core Processor
stepping        : 1
cpu MHz         : 2200.000
cache size      : 512 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb perfctr_l2 arat xsaveopt npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold fsgsbase bmi1 avx2 smep bmi2 rdseed adx
bogomips        : 6786.47
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]

希望你能帮助我，这些尖峰使服务器非常不稳定（即使在 nginx 和大多数软件关闭的情况下）我也已经尝试安装 irqbalance 但它只是将运行 100% 的 cpu 从第一个切换到第十一个。我还让主机将我的驱动器切换到相同架构的另一台机器上，但这也不起作用。

linux - 100% 内核 CPU 终止与服务器的连接

0 回答 0

Related

Reference