我的代码中有一个循环迭代 1 亿次(模拟模型的 1 亿次复制需要)。myarray
对于 1 亿次迭代中的每一次,我通过索引一个名为 的整数变量从数组 ( ) 中检索一个值age
。由于数组的长度,它只对索引myarray[age]
有效age=0,...,99
。但是, 的实际域age
是0,...,inf
.
所以,我有以下功能
int tidx(const int& a) {
return std::min(a,99);
}
这允许索引myarray[tidx(age)]
。
我怎样才能更有效地做到这一点?
[下面的性能输出]
构建说明我正在使用的编译器标志的源文件的示例:
Building file: ../SAR.cpp
Invoking: GCC C++ Compiler
g++ -O3 -Wall -c -fmessage-length=0 -Wno-sign-compare -fopenmp -MMD -MP -MF"SAR.d" -MT"SAR.d" -o"SAR.o" "../SAR.cpp"
Finished building: ../SAR.cpp
从perf record
后跟perf report
:
Samples: 280 of event 'cycles', Event count (approx.): 179855989
24.78% pc2 libc-2.17.so [.] __GI_____strtod_l_internal
11.35% pc2 pc2 [.] samplePSA(int, double, int, NRRan&)
6.67% pc2 libc-2.17.so [.] str_to_mpn.isra.0
6.15% pc2 pc2 [.] simulate4_NEJMdisutilities(Policy&, bool)
5.68% pc2 pc2 [.] (anonymous namespace)::stateTransition(double const&, int const&, int&, double const&, bool const&, bool&, bo
5.25% pc2 pc2 [.] HistogramAges::add(double const&)
3.73% pc2 libstdc++.so.6.0.17 [.] std::istream::getline(char*, long, char)
3.02% pc2 libstdc++.so.6.0.17 [.] std::basic_istream<char, std::char_traits<char> >& std::operator>><char, std::char_traits<char> >(std::basic_
2.49% pc2 [kernel.kallsyms] [k] 0xffffffff81043e6a
2.29% pc2 libc-2.17.so [.] __strlen_sse2
2.00% pc2 libc-2.17.so [.] __mpn_lshift
1.72% pc2 libstdc++.so.6.0.17 [.] __cxxabiv1::__vmi_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::
1.71% pc2 libc-2.17.so [.] __memcpy_ssse3_back
1.67% pc2 libstdc++.so.6.0.17 [.] std::locale::~locale()
1.65% pc2 libc-2.17.so [.] __mpn_construct_double
1.38% pc2 libc-2.17.so [.] memchr
1.29% pc2 pc2 [.] (anonymous namespace)::readTransitionMatrix(double*, std::string)
1.27% pc2 libstdc++.so.6.0.17 [.] std::string::_M_mutate(unsigned long, unsigned long, unsigned long)
1.15% pc2 libc-2.17.so [.] round_and_return
1.02% pc2 libc-2.17.so [.] __mpn_mul
1.01% pc2 libstdc++.so.6.0.17 [.] std::istream::sentry::sentry(std::istream&, bool)
1.00% pc2 libc-2.17.so [.] __memcpy_sse2
0.85% pc2 libstdc++.so.6.0.17 [.] std::locale::locale(std::locale const&)
0.85% pc2 libstdc++.so.6.0.17 [.] std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long)
0.83% pc2 libstdc++.so.6.0.17 [.] std::locale::locale()
0.73% pc2 libc-2.17.so [.] __mpn_mul_1
来自perf stat
:
Performance counter stats for './release/pc2':
62.449034 task-clock # 0.988 CPUs utilized
49 context-switches # 0.785 K/sec
3 cpu-migrations # 0.048 K/sec
861 page-faults # 0.014 M/sec
179,240,478 cycles # 2.870 GHz
58,909,298 stalled-cycles-frontend # 32.87% frontend cycles idle
<not supported> stalled-cycles-backend
320,437,960 instructions # 1.79 insns per cycle
# 0.18 stalled cycles per insn
70,932,710 branches # 1135.850 M/sec
697,468 branch-misses # 0.98% of all branches
0.063228446 seconds time elapsed
我将不胜感激任何评论。我需要学习如何解释/阅读这些信息,因此任何可能帮助我入门的提示将不胜感激。