我正在尝试解决使用 tcmalloc 的多线程应用程序的性能问题。每个线程都会创建大量对象,我的分析是 tcmalloc 中的线程缓存无法分配内存,并且经常尝试从中央页堆中获取内存。这是我的应用程序的输出,其中MALLOCSTATS=2 用于 4 个线程。
Total size of freelists for per-thread caches, transfer cache, and central cache, by size class ------------------------------------------------ class 1 [ 8 bytes ] : 2046 objs; 0.0 MiB; 0.0 cum MiB class 2 [ 16 bytes ] : 1023 objs; 0.0 MiB; 0.0 cum MiB class 3 [ 32 bytes ] : 507 objs; 0.0 MiB; 0.0 cum MiB class 5 [ 64 bytes ] : 511 objs; 0.0 MiB; 0.1 cum MiB class 6 [ 80 bytes ] : 204 objs; 0.0 MiB; 0.1 cum MiB class 9 [ 128 bytes ] : 128 objs; 0.0 MiB; 0.1 cum MiB class 15 [ 224 bytes ] : 73 objs; 0.0 MiB; 0.1 cum MiB class 16 [ 240 bytes ] : 68 objs; 0.0 MiB; 0.1 cum MiB class 17 [ 256 bytes ] : 64 objs; 0.0 MiB; 0.2 cum MiB class 19 [ 320 bytes ] : 47 objs; 0.0 MiB; 0.2 cum MiB class 25 [ 512 bytes ] : 352 objs; 0.2 MiB; 0.3 cum MiB class 26 [ 576 bytes ] : 28 objs; 0.0 MiB; 0.4 cum MiB class 33 [ 1024 bytes ] : 1072 objs; 1.0 MiB; 1.4 cum MiB class 39 [ 2048 bytes ] : 832 objs; 1.6 MiB; 3.0 cum MiB class 45 [ 4096 bytes ] : 276 objs; 1.1 MiB; 4.1 cum MiB class 50 [ 8192 bytes ] : 2 objs; 0.0 MiB; 4.1 cum MiB ------------------------------------------------ PageHeap: 16 sizes; 713.5 MiB free; 0.0 MiB unmapped ------------------------------------------------ 2 pages * 39 spans ~ 0.6 MiB; 0.6 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 4 pages * 19 spans ~ 0.6 MiB; 1.2 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 6 pages * 17 spans ~ 0.8 MiB; 2.0 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 8 pages * 6 spans ~ 0.4 MiB; 2.4 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 10 pages * 4 spans ~ 0.3 MiB; 2.7 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 12 pages * 2 spans ~ 0.2 MiB; 2.9 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 14 pages * 2 spans ~ 0.2 MiB; 3.1 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 16 pages * 2 spans ~ 0.2 MiB; 3.3 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 20 pages * 1 spans ~ 0.2 MiB; 3.5 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 28 pages * 1 spans ~ 0.2 MiB; 3.7 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 30 pages * 2 spans ~ 0.5 MiB; 4.2 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 34 pages * 1 spans ~ 0.3 MiB; 4.5 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 44 pages * 2 spans ~ 0.7 MiB; 5.1 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 76 pages * 1 spans ~ 0.6 MiB; 5.7 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 78 pages * 1 spans ~ 0.6 MiB; 6.3 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum 108 pages * 1 spans ~ 0.8 MiB; 7.2 MiB cum; unmapped: 0.0 MiB; 0.0 MiB cum
255 个大 * 15 个跨度 ~ 706.3 MiB;713.5 MiB 暨;未映射:0.0 MiB;0.0 MiB 暨
现在我真的不明白这是否表明哪些线程缓存正在耗尽。我对线程缓存耗尽的分析是基于观察在 GDB 下运行的程序并解释调用 futex 系统调用的 tcmalloc 代码。
更新我还注意到,当线程数量增加/减少时,每个线程的缓存没有改变。它是增长的页堆。