0

我正在使用英特尔 Westmere 处理器。westmere 的架构由排列在 2 个芯片上的 12 个 CPU 内核组成。所以这意味着每个芯片包含6个内核。

我不知道 CPU 内核是如何排序或编号的。我的猜测是它可以是以下任何一种:

  1. 核心 0、1、2、3、4 和 5 在一个芯片上,核心 6、7、8、9、10 和 11 在第二个芯片上
  2. 核心 0、2、4、6、8 和 10 在一个芯片上,核心 1、3、5、7、9 和 11 在第二个芯片上

有谁知道 CPU 内核的排序/编号

4

2 回答 2

1

For more information you can try to use this tool: http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration

It is the official tool to determine that.

Here is an example run from a machine with two physical Intel X5560 (6core+6HT) running CentOS 5.3 (might be old a bit).

Package 0 Cache and Thread details

Box Description:
Cache  is cache level designator
Size   is cache size
OScpu# is cpu # as seen by OS
Core   is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
       CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with 'z#'
       where # is number of zeroes (so '8z5' is '0x800000')
L1D is Level 1 Data cache, size(KBytes)= 32,  Cores/cache= 2, Caches/package= 4
L1I is Level 1 Instruction cache, size(KBytes)= 32,  Cores/cache= 2, Caches/package= 4
L2 is Level 2 Unified cache, size(KBytes)= 256,  Cores/cache= 2, Caches/package= 4
L3 is Level 3 Unified cache, size(KBytes)= 8192,  Cores/cache= 8, Caches/package= 1
      +-----------+-----------+-----------+-----------+
Cache |  L1D      |  L1D      |  L1D      |  L1D      |
Size  |  32K      |  32K      |  32K      |  32K      |
OScpu#|    0     8|    1     9|    2    10|    3    11|
Core  |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|
AffMsk|    1   100|    2   200|    4   400|    8   800|
CmbMsk|  101      |  202      |  404      |  808      |
      +-----------+-----------+-----------+-----------+

Cache |  L1I      |  L1I      |  L1I      |  L1I      |
Size  |  32K      |  32K      |  32K      |  32K      |
      +-----------+-----------+-----------+-----------+

Cache |   L2      |   L2      |   L2      |   L2      |
Size  | 256K      | 256K      | 256K      | 256K      |
      +-----------+-----------+-----------+-----------+

Cache |   L3                                          |
Size  |   8M                                          |
CmbMsk|  f0f                                          |
      +-----------------------------------------------+

Combined socket AffinityMask= 0xf0f

Package 1 Cache and Thread details

Box Description:
Cache  is cache level designator
Size   is cache size
OScpu# is cpu # as seen by OS
Core   is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
       CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with 'z#'
       where # is number of zeroes (so '8z5' is '0x800000')
      +-----------+-----------+-----------+-----------+
Cache |  L1D      |  L1D      |  L1D      |  L1D      |
Size  |  32K      |  32K      |  32K      |  32K      |
OScpu#|    4    12|    5    13|    6    14|    7    15|
Core  |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|
AffMsk|   10   1z3|   20   2z3|   40   4z3|   80   8z3|
CmbMsk| 1010      | 2020      | 4040      | 8080      |
      +-----------+-----------+-----------+-----------+

Cache |  L1I      |  L1I      |  L1I      |  L1I      |
Size  |  32K      |  32K      |  32K      |  32K      |
      +-----------+-----------+-----------+-----------+

Cache |   L2      |   L2      |   L2      |   L2      |
Size  | 256K      | 256K      | 256K      | 256K      |
      +-----------+-----------+-----------+-----------+

Cache |   L3                                          |
Size  |   8M                                          |
CmbMsk| f0f0                                          |
      +-----------------------------------------------+
于 2013-09-06T00:16:09.500 回答
0

它们应该是交错的,以便采用连续的内核尽可能分散负载。如果 0 和 1 在同一个芯片上,那么只使用两个内核的幼稚代码将浪费一半的缓存。

所以编号的核心应该首先替换物理 CPU。如果可能的话,他们应该下一个交替死亡。然后,它们应该通过单个芯片上的内核。如果可能,它们应该包括虚拟核心。

因此,如果您有两个物理 CPU(P1、P2)、每个双核(C1、C2)和每个超线程(V1、V2),那么内核应该是:P1C1V1、P2C1V1、P1C2V1、P2C2V1、P1C1V2、P2C1V2、P1C2V2 , P2C2V2

其基本原理是允许不了解 CPU 拓扑结构的代码获取尽可能多的内核,因为它知道如何使用并获得最佳性能。如果你只能支持两个内核,你需要 P1C1V1 和 P2C1V1,而不是 P1C1V1 和 P1C1V2,否则你会大量浪费缓存和执行单元。

于 2011-09-17T12:45:06.437 回答