multithreading - 什么代码将从删除（超时）连接中恢复？

Question

我在一个系统上有一个 gen_server，在其他 4 个系统上有 4 个客户端。当 gen_server 报告“** Removing (timedout) connection **”时，代码按预期运行 3 或 4 天。因为客户端可以在 gen_server 启动之前或之后变为活动状态，所以客户端在每次调用 gen_server 之前执行此代码：

connect_IPDB() ->
% try every 5 sec to connect to the server
case net_kernel:connect_node(?SERVER) of
    % When connected wait an additional 5 seconds for stablilty
    true -> timer:sleep(5000);
    false -> 
        timer:sleep(5000),
        connect_IPDB()
end.

在以任何顺序启动服务器或客户端时，这可以按预期工作。当在服务器上执行时，它们都连接并显示在 nodes() 中。

这是问题所在。在“** Removing (timedout) connection **”错误之后的某个时间，nodes() 显示了所有节点，这意味着客户端没有挂起并且已经执行了上面的代码。然而，与超时节点的通信尚未恢复。如果不重新启动客户端，如何重新建立连接？顺便说一句，重新启动客户端确实解决了这个问题。

任何帮助，不胜感激。

score 0 · Accepted Answer

我终于弄清楚了问题和解决方案。我的超时是由于暂停相关客户端（它们是虚拟机）以便可以备份它们时造成的。因为它们被暂停了，当它们被取消暂停时，客户端中的主管没有看到任何问题，因此不会重新启动程序。

解决方法是将 connect_IPMD 更改为：

connect_IPDB() ->
% See if we are connected to the server. Is the server in the list?
case lists:filter(fun(X) -> string:str(atom_to_list(X),atom_to_list(?SERVER))== 1 end, nodes(connected)) of
    % If empty, then not in list, enter the reconnect loop
    [] -> 
        connect_IPDB("Reconnect");
    % any thing else, then we are connected, so proceed
    _ -> ok 
end.
connect_IPDB(_Reconnect) ->
case net_kernel:connect_node(?SERVER) of
    % When connected wait an additional 5 seconds for stablilty
    true -> 
        timer:sleep(5000),
        Ips = gen_server:call({global, ?SERVER},getall_ips),
        % Re-initialize the iptables
        removechain(),
        createchain(),
        % Load the Ips into the local iptables f2bchain
        load_ips(Ips),
        % restart the ntpd 
        os:cmd("service ntpd restart");
    false -> 
        timer:sleep(5000),
        connect_IPDB("Reconnect")
end.

当客户端退出暂停时，这具有重置客户端时钟（重新启动 NTPD）的额外优势。

我将让主管留在原地处理“真正的”失败，而不是这种自我诱导的失败。

multithreading - 什么代码将从**删除（超时）连接**中恢复？

1 回答 1

Related

Reference

multithreading - 什么代码将从删除（超时）连接中恢复？