c - 将 sched_setaffinity 的最大 CPU 数量确定为的正确值是多少？

Question

对于我可以用来在我的系统上CPU_SET进行sched_setaffinity调用的 CPU 数量的正确值是什么，我有些困惑。

我的/proc/cpuinfo文件：

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 37
model name  : Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping    : 5
microcode   : 0x2
cpu MHz     : 1199.000
cache size  : 3072 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5056.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 37
model name  : Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping    : 5
microcode   : 0x2
cpu MHz     : 1199.000
cache size  : 3072 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5056.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 37
model name  : Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping    : 5
microcode   : 0x2
cpu MHz     : 1199.000
cache size  : 3072 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 2
apicid      : 4
initial apicid  : 4
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5056.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 37
model name  : Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping    : 5
microcode   : 0x2
cpu MHz     : 1199.000
cache size  : 3072 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 2
apicid      : 5
initial apicid  : 5
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5056.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

在这个文件中，我发现有processor编号为 0-3 的行，用于“物理”处理器（总共 4 个处理器）。我可以从中得到这个值，sysconf(_SC_NPROCESSORS_ONLN)但是也有一行cpu cores，每个处理器都有 2 个。我相信这代表了“逻辑”处理器或所考虑的超线程。我应该只使用“物理”值还是可以使用“逻辑”计数？

我不清楚这一点，因为如果我去/proc/PID/status那里Cpus_allowed_list，范围可以从 0-7（总共 8 个处理器）但是，我还编写了一个脚本来调用taskset -c -p PID每个运行的“PID”，这显示了每个进程最大 0-3 的亲和力列表。

score 2 · Accepted Answer

对于超线程，每个核心有 2 个逻辑 CPU。这意味着，如果一个逻辑 CPU 因任何原因（缓存未命中、分支错误预测、指令依赖等）停止运行，则内核可以执行来自另一个逻辑 CPU 的指令，而不会坐在那里等待/被浪费。此外，通常核心能够比单个逻辑 CPU 执行更多的并行操作，因此即使没有任何（常见的）停顿，您仍然可以获得好处（通过提高核心资源的利用率）。在这种情况下; 您想使用所有逻辑 CPU。

对于编写不佳的多线程软件（具有严重可伸缩性问题的软件），超线程的收益可能会因可伸缩性差而丧失。例如，该过程可能会导致“缓存线弹跳”（缓存线经常在内核之间“弹跳”），并且使用亲和性来减少内核数量会有所帮助。再举一个例子，内核的 RAM 带宽可能是瓶颈（导致进程无法从超线程中获益），并且使用亲和性来防止进程在每个内核中使用两个逻辑 CPU 可以提高性能。对于这些情况；您只想使用一些逻辑 CPU（但不知道哪些）。

对于单线程进程，你做什么并不重要。

基本上（假设多线程）；流程的最佳设置取决于流程；因此，您应该运行一些测试以查看亲和力如何影响您的过程。

杂项。笔记

首次引入超线程（Netburst/Pentium 4）时，它“不太理想”，并且大多数操作系统中的调度程序都没有经过优化以有效地调度超线程的负载（这使得情况变得更糟）。这导致很多人认为超线程在很多情况下都是不好的。现代 Intel CPU 没有 Netburst/Pentium 4 所存在的问题，而且现代操作系统调度程序确实对超线程进行了优化。这意味着当时正确的旧假设（“超线程可能很糟糕”）现在大多已过时且错误。

score 1 · Accepted Answer

来自man cpuset 页面的信息“关于Cpus_allowed_list

cpuset 定义了 CPU 和内存节点的列表。系统的 CPU 包括进程可以在其上执行的所有逻辑处理单元，包括（如果存在）包内的多个处理器内核和处理器内核内的超线程。内存节点包括所有不同的主内存库；小型和 SMP 系统通常只有一个内存节点，其中包含系统的所有主内存，而 NUMA（非统一内存访问）系统有多个内存节点

Cpuset 与 sched_setaffinity(2) 调度亲和机制以及内核中的 mbind(2) 和 set_mempolicy(2) 内存放置机制集成在一起。这些机制都不允许进程使用该进程的 cpuset 不允许的 CPU 或内存节点。如果对进程的 cpuset 放置的更改与这些其他机制发生冲突，则强制执行 cpuset 放置，即使这意味着覆盖这些其他机制。内核通过将这些其他机制请求的 CPU 和内存节点静默限制在调用进程的 cpuset 允许的范围内来完成此覆盖。这可能导致这些其他调用返回错误，例如，如果这样的调用最终请求一组空的 CPU 或内存节点，

有关 cpuset以及内核如何处理从一个 cpuset 更改/移动到另一个 cpuset 的请求的附加信息。

系统中的每个进程都只属于一个cpuset。一个进程只能在它所属的cpuset 中的CPU 上运行，并且只能在该cpuset 中的内存节点上分配内存。当一个进程 fork(2)s 时，子进程被放置在与其父进程相同的 cpuset 中。有了足够的权限，一个进程可以从一个cpuset 移动到另一个，并且可以更改现有 cpuset 的允许 CPU 和内存节点。

所以我认为如果你将 Cpus_allowed_list 的值设为 8，那可能意味着你的机器是 4 核并且每个核都启用了超线程。所以逻辑上它变成4 * 2。因此，在调用 sched_setaffinity() 时我们应该使用逻辑 CPU 而不是物理 CPU，并且如果它无法获取有关失败原因的更多信息，我们应该检查返回值。

c - 将 sched_setaffinity 的最大 CPU 数量确定为的正确值是多少？

2 回答 2

Related

Reference