1

我正在尝试解决使用 tcmalloc 的多线程应用程序的性能问题。每个线程都会创建大量对象,我的分析是 tcmalloc 中的线程缓存无法分配内存,并且经常尝试从中央页堆中获取内存。这是我的应用程序的输出,其中MALLOCSTATS=2 用于 4 个线程。

Total size of freelists for per-thread caches,
transfer cache, and central cache, by size class
------------------------------------------------
class   1 [        8 bytes ] :     2046 objs;   0.0 MiB;   0.0 cum MiB
class   2 [       16 bytes ] :     1023 objs;   0.0 MiB;   0.0 cum MiB
class   3 [       32 bytes ] :      507 objs;   0.0 MiB;   0.0 cum MiB
class   5 [       64 bytes ] :      511 objs;   0.0 MiB;   0.1 cum MiB
class   6 [       80 bytes ] :      204 objs;   0.0 MiB;   0.1 cum MiB
class   9 [      128 bytes ] :      128 objs;   0.0 MiB;   0.1 cum MiB
class  15 [      224 bytes ] :       73 objs;   0.0 MiB;   0.1 cum MiB
class  16 [      240 bytes ] :       68 objs;   0.0 MiB;   0.1 cum MiB
class  17 [      256 bytes ] :       64 objs;   0.0 MiB;   0.2 cum MiB
class  19 [      320 bytes ] :       47 objs;   0.0 MiB;   0.2 cum MiB
class  25 [      512 bytes ] :      352 objs;   0.2 MiB;   0.3 cum MiB
class  26 [      576 bytes ] :       28 objs;   0.0 MiB;   0.4 cum MiB
class  33 [     1024 bytes ] :     1072 objs;   1.0 MiB;   1.4 cum MiB
class  39 [     2048 bytes ] :      832 objs;   1.6 MiB;   3.0 cum MiB
class  45 [     4096 bytes ] :      276 objs;   1.1 MiB;   4.1 cum MiB
class  50 [     8192 bytes ] :        2 objs;   0.0 MiB;   4.1 cum MiB
------------------------------------------------
PageHeap: 16 sizes;  713.5 MiB free;    0.0 MiB unmapped
------------------------------------------------
     2 pages *     39 spans ~    0.6 MiB;    0.6 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
     4 pages *     19 spans ~    0.6 MiB;    1.2 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
     6 pages *     17 spans ~    0.8 MiB;    2.0 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
     8 pages *      6 spans ~    0.4 MiB;    2.4 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    10 pages *      4 spans ~    0.3 MiB;    2.7 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    12 pages *      2 spans ~    0.2 MiB;    2.9 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    14 pages *      2 spans ~    0.2 MiB;    3.1 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    16 pages *      2 spans ~    0.2 MiB;    3.3 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    20 pages *      1 spans ~    0.2 MiB;    3.5 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    28 pages *      1 spans ~    0.2 MiB;    3.7 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    30 pages *      2 spans ~    0.5 MiB;    4.2 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    34 pages *      1 spans ~    0.3 MiB;    4.5 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    44 pages *      2 spans ~    0.7 MiB;    5.1 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    76 pages *      1 spans ~    0.6 MiB;    5.7 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
    78 pages *      1 spans ~    0.6 MiB;    6.3 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum
   108 pages *      1 spans ~    0.8 MiB;    7.2 MiB cum; unmapped:    0.0 MiB;    0.0 MiB cum

255 个大 * 15 个跨度 ~ 706.3 MiB;713.5 MiB 暨;未映射:0.0 MiB;0.0 MiB 暨

现在我真的不明白这是否表明哪些线程缓存正在耗尽。我对线程缓存耗尽的分析是基于观察在 GDB 下运行的程序并解释调用 futex 系统调用的 tcmalloc 代码。

更新我还注意到,当线程数量增加/减少时,每个线程的缓存没有改变。它是增长的页堆。

4

0 回答 0