6

我们有一个 12 核的 MacPro 来做一些蒙特卡罗计算。它的英特尔至强处理器启用了超线程 (HT),因此实际上应该有 24 个进程并行运行才能充分利用它们。但是,我们的计算在 12x100% 上比 24x50% 上运行更有效,因此我们尝试通过系统偏好设置中的窗格关闭超线程Processor,以获得更高的性能。也可以通过以下方式关闭 HT

hwprefs -v cpu_ht=false

然后我们进行了一些测试,结果如下:

  1. 12 个并行任务在不带或不带 HT 的情况下同时运行,令我们失望。
  2. 如果 HT 关闭,24 个并行任务会丢失 20%(不是我们想象的 -50%)
  3. 开启 HT 后,从 24 个任务切换到 12 个任务会降低 20% 的效率(同样令人惊讶)
  4. 当 HT 关闭时,从 24 切换到 12 不会改变任何内容。

似乎超线程只会降低我们计算的性能,而且没有办法避免它。我们用于计算的程序是用 Fortran 编写并用gfortran. 有没有办法让这块硬件更高效?


更新:我们的蒙特卡洛计算 (MCC) 通常分步进行,以避免数据丢失和其他原因(并非总是可以避免此类步骤)。在我们的例子中,每个步骤都包含许多持续时间可变的模拟。由于每个步骤都分为多个并行任务,因此它们也具有可变的持续时间。本质上,所有较快的任务都必须等到最慢的任务完成。这一事实迫使我们采取更大的步骤,由于平均,这些步骤在时间上的偏差更小,因此处理器不会浪费时间在等待上。这是我们使用 12*2.66 GHz 而不是 24*1.33 GHz 的动机。如果可以关闭 HT,那么通过从 24 个任务 w/HT 切换到 12 个任务 w/o HT,我们将获得大约 +10% 的性能。然而,测试表明我们损失了 20%。

对于测试,我使用了相当大的步骤,但通常步骤更短,因此效率更高。

还有一个原因 - 我们的某些计算需要 3-5 GB 的内存,因此您可能会看到我们拥有 12 个快速任务是多么经济。我们正在努力实现共享内存,但这将是一个长期的项目。因此,我们需要找出如何使现有的硬件/软件尽可能快。

4

4 回答 4

8

This is more of an extended comment than an answer:

I don't find your observations terrifically surprising. Hyper-threading is a poor-man's approach to parallelisation, it allows you to have 2 pipelines of pending instructions on one CPU. But it doesn't provide extra floating-point or integer arithmetic units or more registers; when one pipeline is unable to feed the ALU (or whatever it's called these days) the other pipeline is activated within a clock cycle or two. This contrasts with the situation on a CPU without hyperthreading where, when the instruction pipeline stalls, it has to be flushed and refilled with instructions from another process before the CPU gets back up to speed.

The Wikipedia article on hyperthreading explains all this rather well.

If you are running loads in which pipeline stalls are perfectly synchronised and represent a major part of the total execution time of your program mix, then you might double the speed of a program by going from an unhyperthreaded processor to a hyperthreaded processor.

IF (that's a big if) you could write a program which never stalled in the instruction pipeline then hyperthreading would provide no benefit (in terms of execution acceleration) whatsoever. What you have measured is not a speedup due to HT (well, it is a speedup due to HT but you don't actually want that) but the failure of your threads to keep the pipeline moving.

What you have to do is actually decrease the speedup due to HT ! Or, rather, you have to increase the execution rate of the 12 processes (one per core) by keeping the pipeline filled. Personally, I'd switch off hyperthreading while I optimised the program's execution on 12 cores.

Have fun.

于 2010-10-04T13:31:16.743 回答
2

I'm having a bit a of difficulty understanding your description of the benchmarks.

Lets define 100% to be the amount of work you manage to get done with 12 tasks and ht off. And if you were to be able to get twice as much done in the same period of time, we would call it 200%. So, what are the numbers that you would put in the other three boxes?

Edit: Updated with your numbers.

             without HT     with HT
12 tasks     100%           100%
24 tasks     100%           125%

So, my understanding is that with HT disabled, there are gaps of time while your threads are basically paused (such as when they are waiting for data from memory or from disk), so they don't actually run at 2.66 GHz, but a bit less. With hyperthreading enabled, the CUP switches tasks instead of pausing for these momentary gaps, so the total amount of processing power being used goes up.

于 2010-10-04T13:19:07.983 回答
1

嗯,这意味着在开启 HT 的情况下,从 12 个任务切换到 24 个任务可以提高20% 的效率!很好的基准测试!

On the other hand, if your program is written so that each thread can only work on a separate task (as opposed to being able to split a single task into smaller chunks and proceed concurrently), then for the purpose of reducing the latency for each task (from start to finish) you simply need to limit the number of threads to 12 in software. The hardware HT switch can remain in either position.

于 2010-10-04T11:58:07.430 回答
0

See this posting for an app in Xcode tools to enable / disable hyperthreading (and number of CPUs active). The setting does NOT persist across sleep or reboot: http://www.logicprohelp.com/forum/viewtopic.php?f=5&t=88835

(You run the Instruments app, cancel the initial screen, and then change the CPU Preferences).

于 2012-12-28T19:05:26.663 回答