2

在我看来,在并行运行的 python 代码中,至少一个处理器失败的断言应该中止所有处理器,以便:

1)错误消息清晰可见(带有堆栈跟踪)

2)剩余的处理器不会永远等待。

然而,这不是标准断言所做的。

如果处理器 0 上的断言失败,则在使用 mpirun 运行的 python 脚本中已经提出了这个问题, 但我对答案不满意。建议使用 comm.Abort() 函数,但这仅回答上面的第 2) 点。

所以我想知道:并行代码(例如使用 mpi4py)是否有标准的“断言”函数,或者我应该为此编写自己的断言?

谢谢!

编辑——这是我的尝试(在课堂上但可能在外面),肯定可以改进:

import mpi4py.MPI as mpi
import traceback

class My_code():

    def __init__(self, some_parameter=None):

        self.current_com = mpi.COMM_WORLD
        self.rank = self.current_com.rank
        self.nb_procs = self.current_com.size

        self.my_assert(some_parameter is not None)
        self.parameter = some_parameter
        print "Ok, parameter set to " + repr(self.parameter)

    # some class functions here...

    def my_assert(self, assertion):
        """
        this is a try for an assert function that kills 
        every process in a parallel run
        """
        if not assertion:
            print 'Traceback (most recent call last):'
            for line in traceback.format_stack()[:-1]:
                print(line.strip())
            print 'AssertionError'
            if self.nb_procs == 1:
                exit()
            else:
                self.current_com.Abort()
4

1 回答 1

0

我认为下面的代码回答了这个问题。它源自 Dan D.

import mpi4py.MPI as mpi
import sys


# put this somewhere but before calling the asserts
sys_excepthook = sys.excepthook
def mpi_excepthook(type, value, traceback): 
    sys_excepthook(type, value, traceback) 
    if mpi.COMM_WORLD.size > 1:
        mpi.COMM_WORLD.Abort(1) 
sys.excepthook = mpi_excepthook 

# example:
if mpi.COMM_WORLD.rank == 0:
    # with sys.excepthook redefined as above this will kill every processor
    # otherwise this would only kill processor 0
    assert 1==0          

# assume here we have a lot of print messages
for i in range(50):
    print "rank = ", mpi.COMM_WORLD.rank

# with std asserts the code would be stuck here 
# and the error message from the failed assert above would hardly be visible
mpi.COMM_WORLD.Barrier()   
于 2015-12-16T13:29:30.603 回答