7

我使用 poolboy 创建了一个简单的应用程序,工作人员几乎是空的,但是当我停止应用程序时,我看到 lager 打印了以下错误:

10:50:26.363 [error] Supervisor {<0.236.0>,poolboy_sup} had child test_worker started with test_worker:start_link([]) at undefined exit with reason shutdown in context shutdown_error

是什么导致了这个错误,我该如何解决这个问题?

导师:

-module(test_sup).
-behaviour(supervisor).
-export([start_link/0, init/1]).


start_link() ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

init([]) ->
    ChildSpecs = [pool_spec()],
    {ok, {{one_for_one, 1000, 3600}, ChildSpecs}}.

pool_spec() ->
    Name = test_pool,
    PoolArgs = [{name, {local, Name}},
                {worker_module, test_worker},
                {size, 10},
                {max_overflow, 20}],
    poolboy:child_spec(Name, PoolArgs, []).

工人:

-module(test_worker).
-behaviour(gen_server).
-behaviour(poolboy_worker).

-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2,
     handle_info/2, terminate/2, code_change/3]).

-record(state, {}).

start_link([]) ->
    gen_server:start_link(?MODULE, [], []).

init([]) ->
    {ok, #state{}}.

handle_call(_Request, _From, State) ->
    {reply, _Reply = ok, State}.

handle_cast(_Msg, State) ->
    {noreply, State}.

handle_info(_Info, State) ->
    {noreply, State}.

terminate(_Reason, _State) ->
    ok.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

应用程序的其余部分非常标准。

二郎:R16B02

台球男孩:1.0.1

Lager:撰写问题时大师的最新版本(822062478a223313dce30e5a45e30a50a4b7dc4e)

4

2 回答 2

9

您看到的错误实际上不是错误,而是lager生成的错误报告。此报告似乎是由poolboy 中的错误引起的。

您可以:

  • 修复错误并向 poolboy 开发人员提交补丁。
  • 安全地忽略该报告。
  • 在退出时手动终止您的工作人员。

当您停止 OTP 应用程序时应该发生的是监督树用于终止所有进程,最好是优雅地终止。默认的做法是向受监督的进程发送一个shutdown信号,如果这在一段时间后不起作用,就残忍地杀死它们。当一切顺利时,您永远不会收到任何报告。

有两个 Erlang 微妙之处可以理解这个 bug:

  1. 进程可以被链接,这意味着当一个进程异常终止时(即除 之外的原因normal),所有链接的进程都以相同的原因终止。这个原语是 OTP 监督的基础。
  2. 进程可以捕获退出信号(或捕获退出),这意味着它接收退出信号作为常规消息而不是被终止(包括normal哪些不会终止它,但不包括kill哪些会无条件终止它)。

与捕获出口相结合的链接通常用于监视进程的终止,并具有在监视进程终止时终止受监视进程的额外好处。例如,如果主管终止,则其子女也应终止。也存在不对称monitor机制。

在这里,您的主管(实现 test_sup 行为)以原因终止shutdown,因为它应该是。主管行为实际上会捕获出口,当它收到shutdown信号时,它会根据其关闭策略尝试终止其子项。在这里,您使用默认策略,即向孩子发送shutdown信号作为第一次尝试。因此,您的主管将shutdown信号发送给其唯一的孩子。

Poolboy 在这里介绍它的神奇之处,你的 supervisor 的 child 实际上是一个gen_serverwithpoolboy回调模块。它应该关闭池并正常终止。

This module is linked to the pool supervisor, but also to the workers. This surprising implementation choice is probably that a crash of the pool (the poolboy gen_server) shall terminate the workers. However, this is the source of the bug, and an asymmetric monitor would probably make more sense. Since the supervisor is already linked to the poolboy gen_server, a termination of the poolboy process will eventually lead to a termination of the workers anyway.

The consequence of linking to the workers is that they also get the shutdown exit signal which was initially directed to the poolboy process. And they are terminated. This termination is considered abnormal by the workers' supervisor (implementing poolboy_sup callback) since it did not send the signal itself. As a result, the supervisor reports the shutdown, which is logged by lager here.

The fact that poolboy traps exits does not prevent the propagation of the shutdown signal. The process is not terminated immediately when it receives the signal but it receives it as a message. gen_server intercepts this message, calls terminate/2 callback function and then terminates with shutdown, eventually propagating the signal to all linked processes.

If avoiding to link to workers is not an option, a way to fix this bug would be to unlink all workers in the terminate handler.

于 2013-10-15T08:25:24.320 回答
1

你如何停止应用程序?也许主管应该有一个 stop/1 功能?例如,见

http://www.erlang.org/doc/apps/kernel/application.html#stop-1

于 2013-10-15T07:49:09.770 回答