我编写了一些结构如下的 C++ 代码:
double kernel(params)
{
//code
}
void optimize(params)
{
//some code
double x = kernel();
//some more code
}
int main()
{
//some code
optimize();
//some more code
}
我尝试使用以下命令使用 callgrind 对其进行分析:
g++ -O3 -g sgd.cpp
valgrind --tool=callgrind ./a.out commandline_args
callgrind_annotate callgrind.out.XXXX
我得到以下输出:
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
12,916,968,785 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
5,862,783,191 /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/memcpy-ssse3.S:__memmove_ssse3 [/lib/i386-linux-gnu/libc-2.15.so]
2,847,653,393 /build/buildd/eglibc-2.15/malloc/malloc.c:_int_malloc [/lib/i386-linux-gnu/libc-2.15.so]
1,327,109,692 /build/buildd/eglibc-2.15/malloc/malloc.c:_int_free [/lib/i386-linux-gnu/libc-2.15.so]
847,560,182 sgd.cpp:main [a.out]
503,022,767 /build/buildd/eglibc-2.15/malloc/malloc.c:malloc [/lib/i386-linux-gnu/libc-2.15.so]
235,458,068 /build/buildd/eglibc-2.15/malloc/malloc.c:free [/lib/i386-linux-gnu/libc-2.15.so]
213,580,120 /build/buildd/eglibc-2.15/math/../sysdeps/i386/fpu/e_exp.S:__ieee754_exp [/lib/i386-linux-gnu/libm-2.15.so]
203,349,602 ???:operator new(unsigned int) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
192,222,108 /build/buildd/eglibc-2.15/math/../sysdeps/ieee754/dbl-64/w_exp.c:exp [/lib/i386-linux-gnu/libm-2.15.so]
128,438,068 /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/strcat.S:0x0012ac73 [/lib/i386-linux-gnu/libc-2.15.so]
128,431,176 ???:operator delete(void*) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
128,358,564 /usr/include/c++/4.6/ext/new_allocator.h:main
117,645,255 /usr/include/c++/4.6/bits/stl_vector.h:main
112,167,083 /usr/include/c++/4.6/bits/stl_algobase.h:main
除了 main(),它不显示源代码的哪些部分占用了大部分时间。我知道大部分时间都花在了 optimize() 函数中,而大部分时间都花在了 kernel() 函数中,但我从输出中看不到这一点。如何获取详细信息以便加快代码速度?
如果有帮助,我将在代码中广泛使用 std::vectors。前段时间我使用数组实现了一个类似的代码,那时 callgrind 似乎工作得很好。这可能是一个问题吗?
如果我禁用 O3 标志,我会得到以下输出:
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
19,026,610,083 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
5,233,252,577 /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/memcpy-ssse3.S:__memmove_ssse3 [/lib/i386-linux-gnu/libc-2.15.so]
2,542,000,057 /build/buildd/eglibc-2.15/malloc/malloc.c:_int_malloc [/lib/i386-linux-gnu/libc-2.15.so]
1,184,626,252 /build/buildd/eglibc-2.15/malloc/malloc.c:_int_free [/lib/i386-linux-gnu/libc-2.15.so]
983,472,430 sgd.cpp:optimize(std::vector<double, std::allocator<double> >, std::vector<int, std::allocator<int> >, std::vector<double, std::allocator<double> >) [a.out]
781,018,740 ???:std::vector<double, std::allocator<double> >::operator[](unsigned int) [a.out]
772,117,839 sgd.cpp:kernel(std::vector<double, std::allocator<double> >, int, int, double) [a.out]
476,616,742 ???:std::vector<double, std::allocator<double> >::vector(std::vector<double, std::allocator<double> > const&) [a.out]
449,016,969 /build/buildd/eglibc-2.15/malloc/malloc.c:malloc [/lib/i386-linux-gnu/libc-2.15.so]
324,200,916 ???:std::vector<double, std::allocator<double> >::size() const [a.out]
305,705,504 ???:std::_Vector_base<double, std::allocator<double> >::_Vector_base(unsigned int, std::allocator<double> const&) [a.out]
267,492,204 ???:std::_Vector_base<double, std::allocator<double> >::~_Vector_base() [a.out]
238,309,873 /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<double>(double const*, double const*, double*) [a.out]
238,308,370 /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
228,776,040 /usr/include/c++/4.6/bits/stl_algobase.h:std::_Miter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
228,776,038 /usr/include/c++/4.6/bits/stl_algobase.h:double* std::copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
210,178,748 /build/buildd/eglibc-2.15/malloc/malloc.c:free [/lib/i386-linux-gnu/libc-2.15.so]
210,172,446 ???:std::vector<double, std::allocator<double> >::~vector() [a.out]
209,711,018 sgd.cpp:square(double) [a.out]
190,646,380 /build/buildd/eglibc-2.15/math/../sysdeps/i386/fpu/e_exp.S:__ieee754_exp [/lib/i386-linux-gnu/libm-2.15.so]
181,517,469 ???:operator new(unsigned int) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
171,582,030 /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, true>::_S_base(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
171,581,742 /build/buildd/eglibc-2.15/math/../sysdeps/ieee754/dbl-64/w_exp.c:exp [/lib/i386-linux-gnu/libm-2.15.so]
152,853,344 ???:__gnu_cxx::new_allocator<double>::allocate(unsigned int, void const*) [a.out]
152,852,752 ???:std::_Vector_base<double, std::allocator<double> >::_Vector_impl::_Vector_impl(std::allocator<double> const&) [a.out]
152,517,360 /usr/include/c++/4.6/bits/stl_algobase.h:std::_Niter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
152,517,360 /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, false>::_S_base(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
152,517,360 /usr/include/c++/4.6/bits/stl_iterator.h:__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >::__normal_iterator(double const* const&) [a.out]
133,746,571 ???:std::_Vector_base<double, std::allocator<double> >::_M_deallocate(double*, unsigned int) [a.out]
133,452,690 ???:std::vector<double, std::allocator<double> >::end() const [a.out]
133,452,690 ???:std::vector<double, std::allocator<double> >::begin() const [a.out]
131,134,604 sgd.cpp:sign(double) [a.out]
123,920,353 /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move_a<false, double const*, double*>(double const*, double const*, double*) [a.out]
121,192,848 ???:std::vector<int, std::allocator<int> >::operator[](unsigned int) [a.out]
114,649,360 /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/strcat.S:0x0012ac73 [/lib/i386-linux-gnu/libc-2.15.so]
114,642,456 ???:operator delete(void*) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
114,388,018 /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
114,388,018 /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::uninitialized_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
114,388,018 /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*, double>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*, std::allocator<double>&) [a.out]
105,086,674 /usr/include/c++/4.6/bits/stl_vector.h:std::_Vector_base<double, std::allocator<double> >::_M_allocate(unsigned int) [a.out]
95,533,505 ???:std::_Vector_base<double, std::allocator<double> >::_M_get_Tp_allocator() [a.out]
95,533,300 /usr/include/c++/4.6/bits/stl_construct.h:void std::_Destroy<double*>(double*, double*) [a.out]
95,533,300 /usr/include/c++/4.6/bits/stl_construct.h:void std::_Destroy<double*, double>(double*, double*, std::allocator<double>&) [a.out]
95,532,970 /usr/include/c++/4.6/bits/allocator.h:std::allocator<double>::allocator(std::allocator<double> const&) [a.out]
95,323,350 /usr/include/c++/4.6/bits/stl_iterator.h:__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >::base() const [a.out]
76,594,040 /usr/include/c++/4.6/bits/allocator.h:std::allocator<double>::~allocator() [a.out]
76,428,152 /usr/include/c++/4.6/bits/stl_algobase.h:std::_Niter_base<double*>::iterator_type std::__niter_base<double*>(double*) [a.out]
76,426,584 /usr/include/c++/4.6/ext/new_allocator.h:__gnu_cxx::new_allocator<double>::deallocate(double*, unsigned int) [a.out]
76,426,344 ???:std::_Vector_base<double, std::allocator<double> >::_Vector_impl::~_Vector_impl() [a.out]
75,798,592 /usr/include/c++/4.6/bits/stl_algobase.h:__gnu_cxx::__enable_if<std::__is_scalar<double>::__value, double*>::__type std::__fill_n_a<double*, unsigned int, double>(double*, unsigned int, double const&) [a.out]
47,768,335 /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<double*, false>::_S_base(double*) [a.out]
47,767,040 ???:__gnu_cxx::new_allocator<double>::max_size() const [a.out]
47,662,045 ???:std::_Vector_base<double, std::allocator<double> >::_M_get_Tp_allocator() const [a.out]
38,297,020 /usr/include/c++/4.6/ext/new_allocator.h:__gnu_cxx::new_allocator<double>::~new_allocator() [a.out]
这比之前的输出有更多的信息,但仍然存在两个问题:第一,未优化代码上的输出并不能帮助我使优化后的代码更快。第二,大部分时间(~50%)被 libc 函数占用,我没有直接在我的代码中使用这些函数。我如何知道代码的哪些部分映射到这些调用?