我已经阅读了有关此参数的文档,但差异确实很大!启用时,一个简单程序(见下文)的内存使用量约为7 GB,禁用时,报告的使用量约为160 KB。
top
还显示大约 7 GB,这有点证实了结果pages-as-heap=yes
。
(我有一个理论,但我不相信它会解释如此巨大的差异,所以 - 寻求帮助)。
尤其困扰我的是,报告的大部分内存使用量都被 使用std::string
,而what?
从未打印(这意味着 - 实际容量非常小)。
我确实需要pages-as-heap=yes
在分析我的应用程序时使用,我只是想知道如何避免“误报”
代码片段:
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
void run()
{
while (true)
{
std::string s;
s += "aaaaa";
s += "aaaaaaaaaaaaaaa";
s += "bbbbbbbbbb";
s += "cccccccccccccccccccccccccccccccccccccccccccccccccc";
if (s.capacity() > 1024) std::cout << "what?" << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
int main()
{
std::vector<std::thread> workers;
for( unsigned i = 0; i < 192; ++i ) workers.push_back(std::thread(&run));
workers.back().join();
}
编译:g++ --std=c++11 -fno-inline -g3 -pthread
与pages-as-heap=yes
:
100.00% (7,257,714,688B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->99.75% (7,239,757,824B) 0x54E0679: mmap (mmap.c:34)
| ->53.63% (3,892,314,112B) 0x545C3CF: new_heap (arena.c:438)
| | ->53.63% (3,892,314,112B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
| | ->53.63% (3,892,314,112B) 0x5463248: malloc (malloc.c:2911)
| | ->53.63% (3,892,314,112B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x4CF8E37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x4CF9C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x4CF9D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x4CF9FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x401252: run() (test.cpp:11)
| | ->53.63% (3,892,314,112B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
| | ->53.63% (3,892,314,112B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
| | ->53.63% (3,892,314,112B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
| | ->53.63% (3,892,314,112B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x51C96B8: start_thread (pthread_create.c:333)
| | ->53.63% (3,892,314,112B) 0x54E63DB: clone (clone.S:109)
| |
| ->35.14% (2,550,136,832B) 0x545C35B: new_heap (arena.c:427)
| | ->35.14% (2,550,136,832B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
| | ->35.14% (2,550,136,832B) 0x5463248: malloc (malloc.c:2911)
| | ->35.14% (2,550,136,832B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x4CF8E37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x4CF9C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x4CF9D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x4CF9FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x401252: run() (test.cpp:11)
| | ->35.14% (2,550,136,832B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
| | ->35.14% (2,550,136,832B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
| | ->35.14% (2,550,136,832B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
| | ->35.14% (2,550,136,832B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x51C96B8: start_thread (pthread_create.c:333)
| | ->35.14% (2,550,136,832B) 0x54E63DB: clone (clone.S:109)
| |
| ->10.99% (797,306,880B) 0x51CA1D4: pthread_create@@GLIBC_2.2.5 (allocatestack.c:513)
| ->10.99% (797,306,880B) 0x4CE2DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->10.99% (797,306,880B) 0x4CE2ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->10.99% (797,306,880B) 0x401BEA: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
| ->10.99% (797,306,880B) 0x401353: main (test.cpp:24)
|
->00.25% (17,956,864B) in 1+ places, all below ms_print's threshold (01.00%)
同时pages-as-heap=no
:
96.38% (159,289B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->43.99% (72,704B) 0x4EBAEFE: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->43.99% (72,704B) 0x40106B8: call_init.part.0 (dl-init.c:72)
| ->43.99% (72,704B) 0x40107C9: _dl_init (dl-init.c:30)
| ->43.99% (72,704B) 0x4000C68: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
|
->33.46% (55,296B) 0x40138A3: _dl_allocate_tls (dl-tls.c:322)
| ->33.46% (55,296B) 0x53D126D: pthread_create@@GLIBC_2.2.5 (allocatestack.c:588)
| ->33.46% (55,296B) 0x4EE9DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->33.46% (55,296B) 0x4EE9ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->33.46% (55,296B) 0x401BEA: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
| ->33.46% (55,296B) 0x401353: main (test.cpp:24)
|
->12.12% (20,025B) 0x4EFFE37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->12.12% (20,025B) 0x4F00C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->12.12% (20,025B) 0x4F00D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->12.12% (20,025B) 0x4F00FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->12.07% (19,950B) 0x401285: run() (test.cpp:14)
| | ->12.07% (19,950B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
| | ->12.07% (19,950B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
| | ->12.07% (19,950B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
| | ->12.07% (19,950B) 0x4EE9C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->12.07% (19,950B) 0x53D06B8: start_thread (pthread_create.c:333)
| | ->12.07% (19,950B) 0x56ED3DB: clone (clone.S:109)
| |
| ->00.05% (75B) in 1+ places, all below ms_print's threshold (01.00%)
|
->05.58% (9,216B) 0x40315B: __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*) (new_allocator.h:104)
| ->05.58% (9,216B) 0x402FC2: std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> > >::allocate(std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> >&, unsigned long) (alloc_traits.h:488)
| ->05.58% (9,216B) 0x402D4B: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::thread::_Impl<std::_Bind_simple<void (*())()> >*, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr_base.h:616)
| ->05.58% (9,216B) 0x402BDE: std::__shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> >, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr_base.h:1090)
| ->05.58% (9,216B) 0x402A76: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > >::shared_ptr<std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr.h:316)
| ->05.58% (9,216B) 0x402771: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::allocate_shared<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr.h:594)
| ->05.58% (9,216B) 0x402325: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::make_shared<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::_Bind_simple<void (*())()> >(std::_Bind_simple<void (*())()>&&) (shared_ptr.h:610)
| ->05.58% (9,216B) 0x401F9C: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::thread::_M_make_routine<std::_Bind_simple<void (*())()> >(std::_Bind_simple<void (*())()>&&) (thread:196)
| ->05.58% (9,216B) 0x401BC4: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
| ->05.58% (9,216B) 0x401353: main (test.cpp:24)
|
->01.24% (2,048B) 0x402C9A: __gnu_cxx::new_allocator<std::thread>::allocate(unsigned long, void const*) (new_allocator.h:104)
->01.24% (2,048B) 0x402AF5: std::allocator_traits<std::allocator<std::thread> >::allocate(std::allocator<std::thread>&, unsigned long) (alloc_traits.h:488)
->01.24% (2,048B) 0x402928: std::_Vector_base<std::thread, std::allocator<std::thread> >::_M_allocate(unsigned long) (stl_vector.h:170)
->01.24% (2,048B) 0x40244E: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<std::thread>(std::thread&&) (vector.tcc:412)
->01.24% (2,048B) 0x40206D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<std::thread>(std::thread&&) (vector.tcc:101)
->01.24% (2,048B) 0x401C82: std::vector<std::thread, std::allocator<std::thread> >::push_back(std::thread&&) (stl_vector.h:932)
->01.24% (2,048B) 0x401366: main (test.cpp:24)
请忽略线程的糟糕处理,这只是一个非常简短的示例。
更新
看来,这根本不相关std::string
。正如@Lawrence 所建议的,这可以通过简单地在堆上分配一个int
(带有new
)来重现。我相信@Lawrence 非常接近这里的真实答案,引用了他的评论(对于更多读者来说更容易):
劳伦斯:
@KirilKirov 字符串分配实际上并没有占用那么多空间......每个线程都获得它的初始堆栈,然后堆访问映射了一些被不准确反映的大量空间(大约 70m)。您可以通过仅声明 1 个字符串然后进行自旋循环来测量它......显示了相同的虚拟内存使用情况 – Lawrence Sep 28 at 14:51
我:
@Lawrence - 你说得对!好的,所以,你是说(看起来是这样的),在每个线程上,在第一个堆分配上,内存管理器(或操作系统,或其他)为线程的堆分配大量内存需要?并且这个块稍后会被重用(或缩小,如果需要)?– 基里尔基洛夫 9 月 28 日 15:45
劳伦斯:
@KirilKirov 类似的东西......确切的分配可能取决于 malloc 的实现等等——劳伦斯 2 天前