concurrency - 为什么我们必须导出 spawn 使用的函数？

Question

在 Erlang 中，在处理进程时，您必须导出 spawn 函数中使用的函数。

-module(echo).
-export([start/0, loop/0]).

start() ->
  spawn(echo, loop, []).

来自“Programming Erlang, 2nd Edition. page 188”一书的原因是

“请注意，我们还必须从模块中导出 spawn 的参数。这是一个很好的做法，因为我们将能够在不更改客户端代码的情况下更改服务器的内部细节。” .

在“Erlang Programming”一书中，第 121 页：

-module(frequency).
-export([start/0, stop/0, allocate/0, deallocate/1]). 
-export([init/0]).  

%% These are the start functions used to create and 
%% initialize the server.

start() ->
   register(frequency, spawn(frequency, init, [])).

init() ->
   Frequencies = {get_frequencies(), []}, 
   loop(Frequencies).

请记住，在生成进程时，您必须导出 init/0 函数，因为它被 spawn/3 BIF 使用。我们将此函数放在单独的导出子句中，以将其与应该从其他模块调用的客户端函数区分开来。

请您向我解释一下这个原因背后的逻辑吗？

score 2 · Accepted Answer

简短的回答是： spawn 不是“语言构造”，而是库函数。

这意味着“spawn”位于另一个模块中，该模块无权访问您模块中的任何功能，但已导出。

你必须以某种方式传递给'spawn'函数来启动你的代码。它可以是函数值（即spawn(fun() -> (any code you want, including any local functions invocations) end)）或模块/导出的函数名称/参数，这在其他模块中是可见的。

score 1 · Accepted Answer

逻辑非常简单。然而，混淆很容易出现，因为：

导出不完全匹配面向对象的封装，尤其是公共方法；
几种常见的模式需要导出不打算由常规客户端调用的函数。

出口的真正作用

导出具有非常严格的含义：导出的函数是唯一可以通过其完全限定名称（即模块、函数名称和数量）引用的函数。

例如：

-module(m).
-export([f/0]).
f() -> foo.
f(_Arg) -> bar.
g() -> foobar.

您可以使用表达式调用第一个函数，例如，m:f()但这不适用于其他两个函数。m:f(ok)并将m:g()因错误而失败。

出于这个原因，编译器会在上面的例子中警告 f/1 和 g/0 没有被调用，也不能被调用（它们是未使用的）。

函数总是可以从模块外部调用：函数是值，您可以引用本地函数（在模块内），并将此值传递给外部。例如，您可以使用非导出函数生成一个新进程，使用spawn/1. 您可以按如下方式重写您的示例：

start() ->
    spawn(fun loop/0).

这不需要导出循环。Joe Armstrong 在其他版本的Programming Erlang中明确建议按照上述方式转换代码以避免导出loop/0.

需要导出的常见模式

因为导出是从模块外部通过名称引用函数的唯一方法，所以有两种常见模式需要导出函数，即使这些函数不是公共 API的一部分。

您提到的示例是每当您要调用采用 MFA 的库函数时，即模块、函数名称和参数列表。这些库函数将通过其完全限定名称引用该函数。除此之外spawn/3，你可能会遇到timer:apply_after/4。

同样，您可以编写接受 MFA 参数的函数，并使用apply/3.

有时，这些库函数的变体直接采用 0 元函数值。如上所述，spawn 就是这种情况。apply/1 没有意义，因为您只需编写F().

另一种常见的情况是行为回调，尤其是OTP 行为。在这种情况下，您将需要导出当然是按名称引用的回调函数。

好的做法是为这些函数使用单独的导出属性，以明确这些函数不是模块常规接口的一部分。

导出和代码更改

在公共 API 之外使用导出还有第三种常见情况：代码更改。

想象一下你正在编写一个循环（例如一个服务器循环）。您通常会按如下方式实现：

-module(m).
-export([start/0]).
start() -> spawn(fun() -> loop(state) end).
loop(State) ->
    NewState = receive ...
    ...
    end,
    loop(NewState). % not updatable !

此代码无法更新，因为循环永远不会退出模块。正确的方法是导出 loop/1 并执行完全限定的调用：

-module(m).
-export([start/0]).
-export([loop/1]).
start() -> spawn(fun() -> loop(state) end).
loop(State) ->
    NewState = receive ...
    ...
    end,
    ?MODULE:loop(NewState).

实际上，当您使用完全限定名称引用导出的函数时，总是针对最新版本的模块执行查找。所以这个技巧允许在循环的每次迭代中跳转到更新版本的代码。代码更新实际上相当复杂，而 OTP 及其行为适合您。它通常使用相同的构造。

相反，当您调用作为值传递的函数时，这始终来自创建该值的模块版本。乔·阿姆斯特朗（Joe Armstrong）在他的书的专门部分（8.10，MFA 的产生）中认为这是spawn/3over的一个优势。spawn/1他写：

我们编写的大多数程序都spawn(Fun)用于创建新进程。如果我们不想动态升级我们的代码，这很好。有时我们想编写可以在运行时升级的代码。如果我们想确保我们的代码可以动态升级，那么我们必须使用不同形式的 spawn。

这是牵强的，因为当您生成一个新进程时，它会立即启动，并且在新进程开始和创建函数值之间不太可能发生更新。此外，Armstrong 的陈述部分不正确：为了确保代码可以动态升级，spawn/1也能正常工作（参见上面的示例），诀窍不是使用spawn/3，而是执行完全限定的调用（Joe Armstrong 在另一节中对此进行了描述）。spawn/3有其他优势spawn/1。

尽管如此，按值传递函数和按名称传递函数之间的区别解释了为什么没有按值传递函数的版本timer:apply_after/4，因为存在延迟并且当计时器触发时按值传递函数可能是旧的。这种变体实际上是危险的，因为一个模块最多有两个版本：当前版本和旧版本。如果您多次重新加载一个模块，则尝试调用更旧版本代码的进程将被终止。因此，与函数值相比，您通常更喜欢 MFA 及其导出。

score 1 · Accepted Answer

When you do a spawn you create a new completely new process with its own environment and thread of execution. This means that you are no longer executing "inside" the module where the spawn is called, so you must make an "outside" call into the module. the only functions in a module which can be called from the "outside" are exported functions, hence the spawned function must be exported.

It might seem a little strange seeing you are spawning a function in the same module but this is why.

I think it is important to remember that a module is just code and does not contain any deeper meaning than that, for example like a class in an OO language. So even if you have functions from the same module being executed in different processes, a very common occurrence, then there is no implicit connection between them. You still have to send messages between processes even if it is from/to functions in the same module.

EDIT:

About the last part of your question with the quote about putting export init/1 in a separate export declaration. There is no need to do this and it has no semantic significance, you can use as many or as few export declarations as you wish. So you could put all the functions in one export declaration or have a separate one for each function; it makes no difference.

The reason to split them is purely visual and for documentation purposes. You typically group functions which go together into separate export declarations to make it easier to see that they are a group. You also typically put "internal" exported functions, functions which aren't meant for the user to directly call, in a separate export declaration. In this case init/1 has to be exported for the spawn but is not meant to be called directly outside the spawn.

By having the user call the start/0 function to start the server and not have them explicitly spawn the init/1 function allows you to change the internals as you wish later on. The user only sees the start/0 function. Which is what the first quote is trying to say.

score -1 · Accepted Answer

如果你想知道为什么你必须导出任何东西而不是默认情况下所有东西都可见，那是因为如果你隐藏了所有不应该调用的函数，用户会更清楚他们应该调用哪些函数。这样，如果您在实现上改变主意，使用您的代码的人将不会注意到。否则，可能有人正在使用您想要更改或消除的功能。

例如，假设您有一个模块：

-module(somemod).

useful() ->
    helper().
helper() ->
    i_am_helping.

您想将其更改为：

-module(somemod).

useful() ->
    betterhelper().
betterhelper() ->
    i_am_helping_more.

如果人们应该只打电话useful，您应该能够进行此更改。但是，如果所有东西都出口了，人们可能会依赖于helper他们不应该出口的时间。这种变化会在不应该的时候破坏他们的代码。

concurrency - 为什么我们必须导出 spawn 使用的函数？

4 回答 4

出口的真正作用

需要导出的常见模式

导出和代码更改

Related

Reference