问题标签 [intel-vtune]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

156 问题

0 投票

0 回答

95 浏览

windows - 一段时间后的英特尔 VTune 采样

我是 VTune 的新手，并且正在玩它。我无法弄清楚的一件事是如何在每 20 秒后获取多个事件样本并将它们保存在文本文件中。

例如，使用 VTune 运行应用程序，并在 2 分钟内每 20 秒返回一次一般探索结果。这意味着，最后我应该有 6 个事件样本。

2015-02-24T10:10:04.980

0 投票

1 回答

215 浏览

windows - 英特尔 VTune 命令行错误

我正在尝试使用 VTune 命令行来设置在收集停止之前要收集的最大样本数。为此，我使用了 -msc 命令，但我收到一条错误消息，提示未知命令。

我使用的命令是：“C:\Program Files\Intel\VTune Amplifier XE 2015\bin32\amplxe-cl”-collect general-exploration --duration 30 -msc 300

上面的命令给了我“未知命令-msc”错误

我该如何解决这个问题。

windows cmd intel intel-vtune

2015-02-25T08:56:06.713

0 投票

0 回答

577 浏览

cpu - How to monitor the utilization of cores on Xeon Phi at 10Hz?

I've been trying to measure/monitor the utilization of all those 60 cores on Xeon Phi (Knights Corner, in-order processors) at a relatively high frequency, say, at least every 0.1s which yields to 10Hz.

I tried the latest PAPI library. But it only supports PAPI_TOT_INS which is the counter of completed instructions. This won't work because I actually need something related to the instructions issued every 0.1s, not finished. Several instructions issued at different cycles may finish at the same cycle. The issue of instructions is influenced by whether the core is halted or not.

Other commands available like 'top' and 'perf' operate at 1Hz which is too slow for my measurement. I need a higher frequency. And, I also need to synchronize the measurement with vital phases of my codes. So, the Intel Vtune Profile does not work for me either.

Is there a possible way for me to monitor the issue of instructions on Xeon Phi or any other activities linked to their utilizations? I understand that those hardware counters are there, but to read them seems very challenging to me. Maybe I can deduce this utilization by measuring the CPU time of each thread?

Thanks.

cpu intel intel-vtune xeon-phi papi

2015-03-18T02:52:59.593

0 投票

1 回答

81 浏览

windows - 是什么导致分支预测峰值 Vtune

我正在使用 VTune 启动一个应用程序并对其进行分析。运行测试后，我看到分支预测单元出现峰值。

为了优化我的应用程序，我需要弄清楚代码的哪一部分导致了这个峰值。有没有办法通过 VTune 我可以解决这个问题？

windows profiling branch-prediction intel-vtune

2015-04-07T11:12:23.707

0 投票

1 回答

1078 浏览

intel - FLOP测量

我正在尝试使用 intel vtune Amplifier 为我的应用程序估算 FLOPS，我在这里使用这篇文章作为指导：https ://software.intel.com/en-us/articles/estimating-flops-using-event-基于采样-ebs/

问题是我在 vtune gui 中找不到 FP_COMP_OPS_EXE 事件。当我使用此事件配置运行 amplxe-cl 时，我收到以下错误：

amplxe：错误：无效事件 FP_COMP_OPS_EXE.X87 被丢弃。

我正在开发 CentOS，我的处理器是英特尔至强

任何帮助，将不胜感激

intel intel-vtune flops

2015-05-19T23:10:56.160

0 投票

1 回答

786 浏览

performancecounter - 性能测量 - 获取每个函数的平均调用时间。英特尔 Vtune 放大器

我只是想获得每个函数运行的平均时间。这意味着我想要：“函数内的总时间”/“函数调用次数”

当我在 VTune 中运行分析时，我会得到各种信息。这些是我正在使用的设置：

基本热点设置

并且：

高级热点设置

但我找不到平均时间在哪里。我可以看到每个函数的总时间，但找不到调用计数。

使用 Visual Studio 2012、Vtune Amplifier XE 2013、更新 9。

请帮忙。

performancecounter intel-vtune

2015-05-31T13:38:48.513

0 投票

1 回答

73 浏览

intel - VTune 使用 Windows 嵌入式操作系统

我想知道是否可以使用 Windows 嵌入式操作系统使用 VTune 2013 或 VTune2015。我阅读了“不支持嵌入式版本”的发行说明，但我想知道是否有一种方法可以在 Windows 嵌入式系统上收集数据并在标准 Windows 系统上查看结果和/或使用 Windows 执行远程模式嵌入式目标。

谢谢，

乔治奥

intel intel-vtune

2015-06-25T08:58:15.540

0 投票

0 回答

127 浏览

intel-vtune - 感兴趣的英特尔 VTune 放大器 XE 2013 模块

我是 VTune 的初学者。但是我在 AQTime 8 方面有一些经验。现在我使用的是 Intel VTune Amplifier XE 2013。在我看来，它比 AQTime 有很多优势。有一个有趣的问题。在 AQtime 中，我可以选择我感兴趣的模块来分析它们。这非常有用，因为我只需要从大项目中分析一个 dll。Intel VTune Amplifier XE 2013 有这种可能性吗？

我试图找到答案，但只找到了这个（是否可以在二进制文件中的某些代码片段上使用 vtune 而不是整个二进制文件？）。

请给我建议

intel-vtune aqtime

2015-07-03T07:35:57.050

0 投票

4 回答

7761 浏览

c++ - 在已排序的向量中查找最近的索引

我编写了一个 C++ 例程来查找排序数组中最近的双精度元素。有没有加快速度的方法？

reversed如果reversed按降序排序，则基于 boolean 的值有两个分支。

在这种对数组进行排序的情况下，我看不出有更好的方法。所以，通过分析，我发现比较if (value <= x[i] && value > x[i + 1])是昂贵的。

编辑

尝试使用 lower_bound()

c++algorithm performance optimization intel-vtune

2015-07-15T13:06:57.587

0 投票

1 回答

135 浏览

intel-mkl - MKL 函数中的 Vtune 总时间

我正在从事一个大学项目，该项目要求我对在 MKL (11.1.) 中实现的一些三对角特征求解器进行细分。所以我为此实现了一些测试平台，现在，我试图在 vtune（英特尔 VTune Amplifier XE 2013 Update 16）中对此进行分析。我需要找到瓶颈，即代码的哪一部分（MKL，不是我的）以及特征求解器调用的哪些函数我花费的时间最多。

为此，我希望获得每个函数及其被调用者所花费的总时间。但是，我得到的只是每个功能的自我时间。

我的代码是用 icc 14.0/3.174 编译的，我尝试了两种方法，静态和动态链接 MKL。

我希望我不会在这里忽略一些愚蠢的事情。我也非常愿意接受有关如何找到所需值的其他建议。

intel-mkl intel-vtune

2015-07-24T14:45:23.573

1 2 3 4 5 6 7 8 9 10

问题标签 [intel-vtune]

Reference