python - 为什么 `total_num_virtual_procs` 不等于 MPI 进程的数量？

Question

在 NEST 模拟器中有虚拟进程的概念。阅读有关虚拟进程的信息，我希望每个 MPI 进程都包含至少 1 个虚拟进程，否则 MPI 进程没有做任何事情？

但是，当我启动 4 个 MPI 进程时，内核状态属性total_num_virtual_procs是1：

mpiexec -n 4 python -c "import nest; import mpi4py.MPI; print(nest.GetKernelStatus()['total_num_virtual_procs'], mpi4py.MPI.COMM_WORLD.Get_size());"

这将打印 NEST 导入文本和1 4四次。这是否意味着在我这样做之前不会使用 3 个进程进行模拟nest.SetKernelStatus({'total_num_virtual_procs': 4})？

score 2 · Accepted Answer

nest.GetKernelStatus('total_num_virtual_procs')编辑：TL；DR：在以前的 NEST 版本中，返回值是错误的。最近的版本显示了正确的数字，默认情况下每个进程一个线程，因此 MPI 进程的数量。

虚拟进程的数量是 NEST 的一个自由参数，因为它使用 MPI + OpenMP 的混合并行化方案。每个进程可能有多个线程，每个线程都有自己的虚拟进程，例如两个进程和四个 VP 导致每个进程有两个线程：

Process  Thread  VP
-------  ------  --
0        0       0
1        0       1
0        1       2
1        1       3

设置total_num_virtual_procs为 8，每个进程将产生四个线程，依此类推。即使没有mpi4py这样的情况，您上面的示例也可以工作：

mpiexec -n 2 python -c "\
   import nest; \
   nest.SetKernelStatus({'total_num_virtual_procs': 4}); \
   print('>>> this is process %d of %d with %d threads <<<' \
         % ( nest.Rank(),
             nest.NumProcesses(), \
             nest.GetKernelStatus()['total_num_virtual_procs']/nest.NumProcesses()) \
   ); \
   nest.Simulate(10);"

它的输出中有以下几行：

…

>>> this is process 1 of 2 with 2 threads <<<
>>> this is process 0 of 2 with 2 threads <<<
…

Sep 09 15:49:39 SimulationManager::start_updating_ [Info]: 
    Number of local nodes: 0
    Simulation time (ms): 10
    Number of OpenMP threads: 2
    Number of MPI processes: 2

Sep 09 15:49:39 SimulationManager::start_updating_ [Info]: 
    Number of local nodes: 0
    Simulation time (ms): 10
    Number of OpenMP threads: 2
    Number of MPI processes: 2

您可以看到total_num_virtual_procs被拆分到所有进程中，例如Number of OpenMP threadstimesNumber of MPI processes等于total_num_virtual_procs。您还注意到，您在这里看不到 Python 级别的线程并行化，因为进程仅在中进入并行上下文Create()，Connect()并Simulate()在下面的 C++ 范围内调用。

如果您不设置total_num_virtual_procs默认值是每个进程一个线程。nest.Create('iaf_psc_exp', 10)您可以通过在例如两个进程上创建一些神经元来看到这一点：
```
Sep 09 16:28:28 SimulationManager::start_updating_ [Info]: 
    Number of local nodes: 5
    Number of local nodes: 5
    Simulation time (ms): 10
    Simulation time (ms): 10
    Number of OpenMP threads: 1
    Number of OpenMP threads: 1
    Number of MPI processes: 2
    Number of MPI processes: 2
```
每个进程处理十个创建的神经元中的五个。（nest.GetKernelStatus('total_num_virtual_procs')然后应该返回进程数。您使用的是哪个 NEST 版本？这已经修复了……）

如果要设置的 VP 数量不是 MPI 进程的倍数，NEST 会抛出异常。

 nest.lib.hl_api_exceptions.BadProperty: ('BadProperty in SetKernelStatus: Number of virtual processes (threads*processes) must be an integer multiple of the number of processes. Value unchanged.', 'BadProperty', 'SetKernelStatus', ': Number of virtual processes (threads*processes) must be an integer multiple of the number of processes. Value unchanged.')`

在尝试不同几何结构的作业时，一个通常好的起点是每个 NUMA 域一个 MPI 进程（例如，每个物理 cpu 套接字一个进程）和每个物理内核一个线程（超线程可能会导致缓存线的争夺，甚至可能降低性能）。

python - 为什么 `total_num_virtual_procs` 不等于 MPI 进程的数量？

1 回答 1

Related

Reference