python-3.x - dispy 示例程序挂起

Question

TL;DR：我无法让最基本的dispy示例代码正常运行。为什么不？

细节：

我正在尝试在 python 中进行分布式处理，并且认为dispy库听起来很有趣，因为它具有全面的功能集。

但是，我一直在尝试遵循他们的基本规范程序示例，但我一无所获。

我已经安装了 dispy ( python -m pip install dispy)
我继续使用相同子网地址的另一台机器并运行python dispynode.py. 它似乎有效，因为我得到以下输出：

2016-06-14 10:33:38 dispynode - dispynode 版本 4.6.14
2016-06-14 10:33:38 asyncoro - 带有 epoll I/O 通知器的 4.1 版
2016-06-14 10:33:38 dispynode - 服务8 cpu 在 10.0.48.54:51348

输入“quit”或“exit”终止dispynode，“stop”停止
服务，“start”重新启动服务，“cpus”改变使用的CPU，
其他任何获取状态：
回到我的客户端机器上，我运行从http://dispy.sourceforge.net/_downloads/sample.py下载的示例代码，复制到这里：

# function 'compute' is distributed and executed with arguments
# supplied with 'cluster.submit' below
def compute(n):
    import time, socket
    time.sleep(n)
    host = socket.gethostname()
    return (host, n)

if __name__ == '__main__':
    # executed on client only; variables created below, including modules imported,
    # are not available in job computations
    import dispy, random
    # distribute 'compute' to nodes; 'compute' does not have any dependencies (needed from client)
    cluster = dispy.JobCluster(compute)
    # run 'compute' with 20 random numbers on available CPUs
    jobs = []
    for i in range(20):
        job = cluster.submit(random.randint(5,20))
        job.id = i # associate an ID to identify jobs (if needed later)
        jobs.append(job)
    # cluster.wait() # waits until all jobs finish
    for job in jobs:
        host, n = job() # waits for job to finish and returns results
        print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
        # other fields of 'job' that may be useful:
        # job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time
    cluster.print_status()  # shows which nodes executed how many jobs etc.

当我运行这个（python sample.py）时，它只是挂起。通过 pdb 调试，我看到它最终挂在dispy/__init__.py(117)__call__(). 该行显示self.finish.wait()。finish 只是一个 python 线程，wait()然后进入lib/python3.5/threading.py(531)wait(). 一旦到达等待，它就会挂起。

我试过在客户端机器上运行 dispynode 并得到相同的结果。我已经尝试了很多将节点传递到集群创建中的变体，例如：

cluster = dispy.JobCluster(compute, nodes=['localhost'])
cluster = dispy.JobCluster(compute, nodes=['*'])
cluster = dispy.JobCluster(compute, nodes=[<hostname of the remote node running the client>])

我尝试在未cluster.wait()注释的情况下运行，并得到相同的结果。

当我添加日志记录 ( cluster = dispy.JobCluster(compute, loglevel = 10)) 时，我在客户端得到以下输出：

2016-06-14 10:27:01 asyncoro - 带有 epoll I/O 通知程序的 4.1 版
2016-06-14 10:27:01 dispy - dispy client at :51347 2016-06-14 10:27:01 dispy - 存储“_dispy_20160614102701”中的故障恢复信息
2016-06-14 10:27:01 dispy - 待处理的作业：0
2016-06-14 10:27:01 dispy - 待处理的作业：1
2016-06-14 10:27:01 dispy - 待定工作：2
2016-06-14 10:27:01 dispy - 待定工作：3
2016-06-14 10:27:01 dispy - 待定工作：4
2016-06-14 10:27:01 dispy - 待定工作：5
2016-06-14 10:27:01 dispy - 待定工作：6
2016-06-14 10:27:01 dispy - 待定工作：7
2016-06-14 10:27:01 dispy - 待定工作： 8
2016-06-14 10:27:01 显示 - 待定工作：9
2016-06-14 10:27:01 dispy - 待定工作：10

这似乎并不意外，但并不能帮助我弄清楚为什么作业没有运行。

对于它的价值，这里是_dispy_20160614102701.bak：

'_cluster', (0, 207)
'compute_1465918021755', (512, 85)

同样，_dispy_20160614102701.dir：

'_cluster', (0, 207)
'compute_1465918021755', (512, 85)

我没有猜测，除非我使用的是不稳定的版本。

score 1 · Accepted Answer

首次在网络上设置和使用 dispy 时，我发现在创建作业集群时必须指定客户端节点 IP，如下所示：

cluster = dispy.JobCluster(compute, ip_addr=your_ip_address_here)

看看是否有帮助。

score 0 · Accepted Answer

在执行之前python sample.py，dispynode.py应该仍然在 localhost 或另一台机器上运行（注意，如果你不想指定复杂的选项，另一台机器应该在同一个网络中）。

我遇到了同样的问题并以这种方式解决了它：

打开终端并执行：（$ dispynode.py不要终止它）
打开第二个终端并执行：$ python sample.py

不要忘记函数计算包括等待一段时间，输出应该在执行 sample.py 后至少 20 秒出现。

score 0 · Accepted Answer

试试这个

python /home/$user_name/.local/lib/python3.9/site-packages/dispy/dispynode.py

python sample.py

它对我有用

score 0 · Accepted Answer

如果您只是在客户端上运行 sample.py，请在主语句中更改以下内容：

集群 = dispy.JobCluster（计算，节点=['nodeip_1'，'nodeip_2'，.....，'nodeip_n]）

然后在您的 IDE 中或通过 shell 运行它。

我希望这会有所帮助。

python-3.x - dispy 示例程序挂起

4 回答 4

Related

Reference