1

我编写了一些结构如下的 C++ 代码:

double kernel(params)
{
  //code
}

void optimize(params)
{
  //some code
  double x = kernel();
  //some more code
}

int main()
{
  //some code
  optimize();
  //some more code
}

我尝试使用以下命令使用 callgrind 对其进行分析:

g++ -O3 -g sgd.cpp
valgrind --tool=callgrind ./a.out commandline_args
callgrind_annotate callgrind.out.XXXX

我得到以下输出:

--------------------------------------------------------------------------------
            Ir 
--------------------------------------------------------------------------------
12,916,968,785  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
5,862,783,191  /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/memcpy-ssse3.S:__memmove_ssse3 [/lib/i386-linux-gnu/libc-2.15.so]
2,847,653,393  /build/buildd/eglibc-2.15/malloc/malloc.c:_int_malloc [/lib/i386-linux-gnu/libc-2.15.so]
1,327,109,692  /build/buildd/eglibc-2.15/malloc/malloc.c:_int_free [/lib/i386-linux-gnu/libc-2.15.so]
  847,560,182  sgd.cpp:main [a.out]
  503,022,767  /build/buildd/eglibc-2.15/malloc/malloc.c:malloc [/lib/i386-linux-gnu/libc-2.15.so]
  235,458,068  /build/buildd/eglibc-2.15/malloc/malloc.c:free [/lib/i386-linux-gnu/libc-2.15.so]
  213,580,120  /build/buildd/eglibc-2.15/math/../sysdeps/i386/fpu/e_exp.S:__ieee754_exp [/lib/i386-linux-gnu/libm-2.15.so]
  203,349,602  ???:operator new(unsigned int) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
  192,222,108  /build/buildd/eglibc-2.15/math/../sysdeps/ieee754/dbl-64/w_exp.c:exp [/lib/i386-linux-gnu/libm-2.15.so]
  128,438,068  /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/strcat.S:0x0012ac73 [/lib/i386-linux-gnu/libc-2.15.so]
  128,431,176  ???:operator delete(void*) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
  128,358,564  /usr/include/c++/4.6/ext/new_allocator.h:main
  117,645,255  /usr/include/c++/4.6/bits/stl_vector.h:main
  112,167,083  /usr/include/c++/4.6/bits/stl_algobase.h:main

除了 main(),它不显示源代码的哪些部分占用了大部分时间。我知道大部分时间都花在了 optimize() 函数中,而大部分时间都花在了 kernel() 函数中,但我从输出中看不到这一点。如何获取详细信息以便加快代码速度?

如果有帮助,我将在代码中广泛使用 std::vectors。前段时间我使用数组实现了一个类似的代码,那时 callgrind 似乎工作得很好。这可能是一个问题吗?

如果我禁用 O3 标志,我会得到以下输出:

--------------------------------------------------------------------------------
            Ir 
--------------------------------------------------------------------------------
19,026,610,083  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
5,233,252,577  /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/memcpy-ssse3.S:__memmove_ssse3 [/lib/i386-linux-gnu/libc-2.15.so]
2,542,000,057  /build/buildd/eglibc-2.15/malloc/malloc.c:_int_malloc [/lib/i386-linux-gnu/libc-2.15.so]
1,184,626,252  /build/buildd/eglibc-2.15/malloc/malloc.c:_int_free [/lib/i386-linux-gnu/libc-2.15.so]
  983,472,430  sgd.cpp:optimize(std::vector<double, std::allocator<double> >, std::vector<int, std::allocator<int> >, std::vector<double, std::allocator<double> >) [a.out]
  781,018,740  ???:std::vector<double, std::allocator<double> >::operator[](unsigned int) [a.out]
  772,117,839  sgd.cpp:kernel(std::vector<double, std::allocator<double> >, int, int, double) [a.out]
  476,616,742  ???:std::vector<double, std::allocator<double> >::vector(std::vector<double, std::allocator<double> > const&) [a.out]
  449,016,969  /build/buildd/eglibc-2.15/malloc/malloc.c:malloc [/lib/i386-linux-gnu/libc-2.15.so]
  324,200,916  ???:std::vector<double, std::allocator<double> >::size() const [a.out]
  305,705,504  ???:std::_Vector_base<double, std::allocator<double> >::_Vector_base(unsigned int, std::allocator<double> const&) [a.out]
  267,492,204  ???:std::_Vector_base<double, std::allocator<double> >::~_Vector_base() [a.out]
  238,309,873  /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<double>(double const*, double const*, double*) [a.out]
  238,308,370  /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
  228,776,040  /usr/include/c++/4.6/bits/stl_algobase.h:std::_Miter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
  228,776,038  /usr/include/c++/4.6/bits/stl_algobase.h:double* std::copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
  210,178,748  /build/buildd/eglibc-2.15/malloc/malloc.c:free [/lib/i386-linux-gnu/libc-2.15.so]
  210,172,446  ???:std::vector<double, std::allocator<double> >::~vector() [a.out]
  209,711,018  sgd.cpp:square(double) [a.out]
  190,646,380  /build/buildd/eglibc-2.15/math/../sysdeps/i386/fpu/e_exp.S:__ieee754_exp [/lib/i386-linux-gnu/libm-2.15.so]
  181,517,469  ???:operator new(unsigned int) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
  171,582,030  /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, true>::_S_base(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
  171,581,742  /build/buildd/eglibc-2.15/math/../sysdeps/ieee754/dbl-64/w_exp.c:exp [/lib/i386-linux-gnu/libm-2.15.so]
  152,853,344  ???:__gnu_cxx::new_allocator<double>::allocate(unsigned int, void const*) [a.out]
  152,852,752  ???:std::_Vector_base<double, std::allocator<double> >::_Vector_impl::_Vector_impl(std::allocator<double> const&) [a.out]
  152,517,360  /usr/include/c++/4.6/bits/stl_algobase.h:std::_Niter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
  152,517,360  /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, false>::_S_base(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
  152,517,360  /usr/include/c++/4.6/bits/stl_iterator.h:__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >::__normal_iterator(double const* const&) [a.out]
  133,746,571  ???:std::_Vector_base<double, std::allocator<double> >::_M_deallocate(double*, unsigned int) [a.out]
  133,452,690  ???:std::vector<double, std::allocator<double> >::end() const [a.out]
  133,452,690  ???:std::vector<double, std::allocator<double> >::begin() const [a.out]
  131,134,604  sgd.cpp:sign(double) [a.out]
  123,920,353  /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move_a<false, double const*, double*>(double const*, double const*, double*) [a.out]
  121,192,848  ???:std::vector<int, std::allocator<int> >::operator[](unsigned int) [a.out]
  114,649,360  /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/strcat.S:0x0012ac73 [/lib/i386-linux-gnu/libc-2.15.so]
  114,642,456  ???:operator delete(void*) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
  114,388,018  /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
  114,388,018  /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::uninitialized_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
  114,388,018  /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*, double>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*, std::allocator<double>&) [a.out]
  105,086,674  /usr/include/c++/4.6/bits/stl_vector.h:std::_Vector_base<double, std::allocator<double> >::_M_allocate(unsigned int) [a.out]
   95,533,505  ???:std::_Vector_base<double, std::allocator<double> >::_M_get_Tp_allocator() [a.out]
   95,533,300  /usr/include/c++/4.6/bits/stl_construct.h:void std::_Destroy<double*>(double*, double*) [a.out]
   95,533,300  /usr/include/c++/4.6/bits/stl_construct.h:void std::_Destroy<double*, double>(double*, double*, std::allocator<double>&) [a.out]
   95,532,970  /usr/include/c++/4.6/bits/allocator.h:std::allocator<double>::allocator(std::allocator<double> const&) [a.out]
   95,323,350  /usr/include/c++/4.6/bits/stl_iterator.h:__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >::base() const [a.out]
   76,594,040  /usr/include/c++/4.6/bits/allocator.h:std::allocator<double>::~allocator() [a.out]
   76,428,152  /usr/include/c++/4.6/bits/stl_algobase.h:std::_Niter_base<double*>::iterator_type std::__niter_base<double*>(double*) [a.out]
   76,426,584  /usr/include/c++/4.6/ext/new_allocator.h:__gnu_cxx::new_allocator<double>::deallocate(double*, unsigned int) [a.out]
   76,426,344  ???:std::_Vector_base<double, std::allocator<double> >::_Vector_impl::~_Vector_impl() [a.out]
   75,798,592  /usr/include/c++/4.6/bits/stl_algobase.h:__gnu_cxx::__enable_if<std::__is_scalar<double>::__value, double*>::__type std::__fill_n_a<double*, unsigned int, double>(double*, unsigned int, double const&) [a.out]
   47,768,335  /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<double*, false>::_S_base(double*) [a.out]
   47,767,040  ???:__gnu_cxx::new_allocator<double>::max_size() const [a.out]
   47,662,045  ???:std::_Vector_base<double, std::allocator<double> >::_M_get_Tp_allocator() const [a.out]
   38,297,020  /usr/include/c++/4.6/ext/new_allocator.h:__gnu_cxx::new_allocator<double>::~new_allocator() [a.out]

这比之前的输出有更多的信息,但仍然存在两个问题:第一,未优化代码上的输出并不能帮助我使优化后的代码更快。第二,大部分时间(~50%)被 libc 函数占用,我没有直接在我的代码中使用这些函数。我如何知道代码的哪些部分映射到这些调用?

4

0 回答 0