11

I have been trying to design a system which allows a large amount of concurrent users to be represented in memory at the same time. When setting out to design this sytem I immediately thought of some sort of actor based solution a kin to Erlang.

The system has to be done in .NET, so I started working on a prototype in F# using MailboxProcessor but have run into serious performance problems with them. My initial idea was to use one actor (MailboxProcessor) per user to serialize communication the communication for one user.

I have isolated a small piece of code that reproduces the problem I am seeing:

open System.Threading;
open System.Diagnostics;

type Inc() =

    let mutable n = 0;
    let sw = new Stopwatch()

    member x.Start() =
        sw.Start()

    member x.Increment() =
        if Interlocked.Increment(&n) >= 100000 then
            printf "UpdateName Time %A" sw.ElapsedMilliseconds

type Message
    = UpdateName of int * string

type User = {
    Id : int
    Name : string
}

[<EntryPoint>]
let main argv = 

    let sw = Stopwatch.StartNew()
    let incr = new Inc()
    let mb = 

        Seq.initInfinite(fun id -> 
            MailboxProcessor<Message>.Start(fun inbox -> 

                let rec loop user =
                    async {
                        let! m = inbox.Receive()

                        match m with
                        | UpdateName(id, newName) ->
                            let user = {user with Name = newName};
                            incr.Increment()
                            do! loop user
                    }

                loop {Id = id; Name = sprintf "User%i" id}
            )
        ) 
        |> Seq.take 100000
        |> Array.ofSeq

    printf "Create Time %i\n" sw.ElapsedMilliseconds
    incr.Start()

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(UpdateName(i, sprintf "User%i-UpdateName" i));

    System.Console.ReadLine() |> ignore

    0

Just creating the 100k actors take around 800ms on my quad core i7. Then submitting the UpdateName message to each one of the actor and wait for them to complete takes about 1.8 seconds.

Now, I realize there is overhead from all the queue:ing on the ThreadPool, setting/resetting AutoResetEvents, etc internally in the MailboxProcessor. But is this really the expected performance? From reading both MSDN and various blogs on the MailboxProcessor I have gotten the idea that it's to be a kin to erlang actors, but from the abyssmal performance I am seeing this doesn't seem to hold true in reality?

I also tried a modified version of the code, which uses 8 MailboxProcessors and each one of them hold a Map<int, User> map which is used to lookup a user by id, it yielded some improvements bringing down the total time for the UpdateName operation to 1.2 seconds. But it still feels very slow, the modified code is here:

open System.Threading;
open System.Diagnostics;

type Inc() =

    let mutable n = 0;
    let sw = new Stopwatch()

    member x.Start() =
        sw.Start()

    member x.Increment() =
        if Interlocked.Increment(&n) >= 100000 then
            printf "UpdateName Time %A" sw.ElapsedMilliseconds

type Message
    = CreateUser of int * string
    | UpdateName of int * string

type User = {
    Id : int
    Name : string
}

[<EntryPoint>]
let main argv = 

    let sw = Stopwatch.StartNew()
    let incr = new Inc()
    let mb = 

        Seq.initInfinite(fun id -> 
            MailboxProcessor<Message>.Start(fun inbox -> 

                let rec loop users =
                    async {
                        let! m = inbox.Receive()

                        match m with
                        | CreateUser(id, name) ->
                            do! loop (Map.add id {Id=id; Name=name} users)

                        | UpdateName(id, newName) ->
                            match Map.tryFind id users with
                            | None -> 
                                do! loop users

                            | Some(user) ->
                                incr.Increment()
                                do! loop (Map.add id {user with Name = newName} users)
                    }

                loop Map.empty
            )
        ) 
        |> Seq.take 8
        |> Array.ofSeq

    printf "Create Time %i\n" sw.ElapsedMilliseconds

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(CreateUser(i, sprintf "User%i-UpdateName" i));

    incr.Start()

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(UpdateName(i, sprintf "User%i-UpdateName" i));

    System.Console.ReadLine() |> ignore

    0

So my question is here, am I doing something wrong? Have I missunderstood how the MailboxProcessor is supposed to be used? Or is this performance what is expected.

Update:

So I got a hold of some guys on ##fsharp @ irc.freenode.net, which informed me that using sprintf is very slow, and as it turns out that is where a large part of my performance problems were comming from. But, removing the sprintf operations above and just using the same name for every User, I still end up with about 400ms for doign the operations, which feels really slow.

4

2 回答 2

18

现在,我意识到所有队列都有开销:在 MailboxProcessor 内部的 ThreadPool、设置/重置 AutoResetEvents 等。

printf, Map,Seq争夺你的全局变量Inc。而且您正在泄漏堆分配的堆栈帧。事实上,运行基准测试所花费的时间中只有一小部分与MailboxProcessor.

但这真的是预期的表现吗?

我对您的程序的性能并不感到惊讶,但它并没有说明MailboxProcessor.

通过阅读 MSDN 和 MailboxProcessor 上的各种博客,我了解到它与 erlang 演员类似,但从我看到的糟糕表现来看,这在现实中似乎并不成立?

MailboxProcessor概念上有点类似于 Erlang 的一部分。您看到的糟糕表现是由多种原因造成的,其中一些非常微妙,会影响任何此类程序。

所以我的问题就在这里,我做错了吗?

我认为你做错了一些事情。首先,您要解决的问题不清楚,所以这听起来像是一个XY 问题。其次,您正在尝试对错误的事情进行基准测试(例如,您抱怨创建一个需要微秒级的时间,MailboxProcessor但可能仅在建立 TCP 连接时才打算这样做,这需要几个数量级的时间)。第三,您编写了一个基准程序来衡量某些事物的性能,但将您的观察结果归因于完全不同的事物。

让我们更详细地看一下您的基准测试程序。在我们做任何其他事情之前,让我们修复一些错误。你应该总是用sw.Elapsed.TotalSeconds它来测量时间,因为它更精确。您应该始终在异步工作流程中使用return!而不是重复使用,do!否则您将泄漏堆栈帧。

我最初的时间是:

Creation stage: 0.858s
Post stage: 1.18s

接下来,让我们运行一个配置文件以确保我们的程序确实花费了大部分时间来处理 F# MailboxProcessor

77%    Microsoft.FSharp.Core.PrintfImpl.gprintf(...)
 4.4%  Microsoft.FSharp.Control.MailboxProcessor`1.Post(!0)

显然不是我们所希望的。更抽象地思考,我们正在使用类似的东西生成大量数据sprintf然后应用它,但我们正在一起进行生成和应用。让我们分离出我们的初始化代码:

let ids = Array.init 100000 (fun id -> {Id = id; Name = sprintf "User%i" id})
...
    ids
    |> Array.map (fun id ->
        MailboxProcessor<Message>.Start(fun inbox -> 
...
            loop id
...
    printf "Create Time %fs\n" sw.Elapsed.TotalSeconds
    let fxs =
      [|for i in 0 .. 99999 ->
          mb.[i % mb.Length].Post, UpdateName(i, sprintf "User%i-UpdateName" i)|]
    incr.Start()
    for f, x in fxs do
      f x
...

现在我们得到:

Creation stage: 0.538s
Post stage: 0.265s

因此,创建速度提高了 60%,发布速度提高了 4.5 倍。

让我们尝试完全重写您的基准:

do
  for nAgents in [1; 10; 100; 1000; 10000; 100000] do
    let timer = System.Diagnostics.Stopwatch.StartNew()
    use barrier = new System.Threading.Barrier(2)
    let nMsgs = 1000000 / nAgents
    let nAgentsFinished = ref 0
    let makeAgent _ =
      new MailboxProcessor<_>(fun inbox ->
        let rec loop n =
          async { let! () = inbox.Receive()
                  let n = n+1
                  if n=nMsgs then
                    let n = System.Threading.Interlocked.Increment nAgentsFinished
                    if n = nAgents then
                      barrier.SignalAndWait()
                  else
                    return! loop n }
        loop 0)
    let agents = Array.init nAgents makeAgent
    for agent in agents do
      agent.Start()
    printfn "%fs to create %d agents" timer.Elapsed.TotalSeconds nAgents
    timer.Restart()
    for _ in 1..nMsgs do
      for agent in agents do
        agent.Post()
    barrier.SignalAndWait()
    printfn "%fs to post %d msgs" timer.Elapsed.TotalSeconds (nMsgs * nAgents)
    timer.Restart()
    for agent in agents do
      use agent = agent
      ()
    printfn "%fs to dispose of %d agents\n" timer.Elapsed.TotalSeconds nAgents

此版本希望nMsgs每个代理在该代理之前递增共享计数器,从而大大降低该共享计数器的性能影响。该程序还检查不同数量的代理的性能。在这台机器上我得到:

Agents  M msgs/s
     1    2.24
    10    6.67
   100    7.58
  1000    5.15
 10000    1.15
100000    0.36

因此,您看到的 msgs/s 速度较低的部分原因似乎是代理数量异常多(100,000)。使用 10-1,000 个代理时,F# 实施的速度比使用 100,000 个代理时快 10 倍以上。

因此,如果您可以使用这种性能,那么您应该能够在 F# 中编写整个应用程序,但如果您需要获得更高的性能,我建议您使用不同的方法。通过采用像 Disruptor 这样的设计,您甚至可能不必牺牲使用 F#(当然您可以将它用于原型设计)。在实践中,我发现在 .NET 上进行序列化所花费的时间往往比在 F# async 和MailboxProcessor.

于 2013-07-01T09:25:13.307 回答
2

消除后sprintf,我得到了大约 12 秒(Mac 上的单声道没那么快)。以 Phil Trelford 的建议使用 Dictionary 而不是 Map,它达到了 600 毫秒。没有在 Win/.Net 上尝试过。

代码更改很简单,本地可变性对我来说是完全可以接受的:

let mb = 
    Seq.initInfinite(fun id -> 
        MailboxProcessor<Message>.Start(fun inbox -> 
            let di = System.Collections.Generic.Dictionary<int,User>()
            let rec loop () =
                async {
                    let! m = inbox.Receive()

                    match m with
                    | CreateUser(id, name) ->
                        di.Add(id, {Id=id; Name=name})
                        return! loop ()

                    | UpdateName(id, newName) ->
                        match di.TryGetValue id with
                        | false, _ -> 
                            return! loop ()

                        | true, user ->
                            incr.Increment()
                            di.[id] <- {user with Name = newName}
                            return! loop ()
                }

            loop ()
        )
    ) 
    |> Seq.take 8
    |> Array.ofSeq
于 2013-06-28T20:38:58.237 回答