我的服务器运行 Centos 6.9 (64gb Ram) 和 nginx,问题是每 10 分钟在 htop 中有随机的 100% 内核 cpu 峰值,由“events/10”和“ksoftirqd/10”生成。我不知道如何找出产生这个问题的确切过程。
这是我的/proc/interrupts
$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
0: 77897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge timer
1: 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge i8042
8: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
12: 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge i8042
56: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge aerdrv
63: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
64: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
65: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
66: 1426061273 0 0 0 0 0 0 0 0 0 0 1914508 0 0 0 0 PCI-MSI-edge ahci
67: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ahci
68: 3084636512 0 0 0 0 0 0 0 0 0 10149560 0 0 0 0 0 PCI-MSI-edge eth0
NMI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Non-maskable interrupts
LOC: 1972128636 528409367 3519065090 2616991376 2762882221 3577269786 2407615998 2889069038 1939478243 2270996522 1940319131 2244314760 2033706214 2339089941 2303043400 2629954396 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IRQ work interrupts
RES: 1349612979 1979915818 1044674069 463586597 1410841781 641984863 3396971132 3062175502 2189512469 2034852778 1686264346 1571882114 1410891335 1330892006 1273321645 1195645068 Rescheduling interrupts
CAL: 1771384 1771300 1775694 1780259 1778017 1782331 1761855 1755630 1758801 1759472 1770034 1773352 1775468 1779579 1778401 1778652 Function call interrupts
TLB: 1295395722 623515438 528231713 457109575 438669843 412327240 413878597 392015004 399091958 373918339 391267007 362582716 383312220 348908971 376811042 337426419 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 77730 77730 77730 77730 77730 77730 77730 77730 77730 77730 77730 77730 77730 77730 77730 77730 Machine check polls
ERR: 0
MIS: 0
这是我的/proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD Ryzen 7 PRO 1700X Eight-Core Processor
stepping : 1
cpu MHz : 2200.000
cache size : 512 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb perfctr_l2 arat xsaveopt npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold fsgsbase bmi1 avx2 smep bmi2 rdseed adx
bogomips : 6786.47
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]
希望你能帮助我,这些尖峰使服务器非常不稳定(即使在 nginx 和大多数软件关闭的情况下)我也已经尝试安装 irqbalance 但它只是将运行 100% 的 cpu 从第一个切换到第十一个。我还让主机将我的驱动器切换到相同架构的另一台机器上,但这也不起作用。