4

I've found myself recently using the SemaphoreSlim class to limit the work in progress of a parallelisable operation on a (large) streamed resource:

// The below code is an example of the structure of the code, there are some 
// omissions around handling of tasks that do not run to completion that should be in production code

SemaphoreSlim semaphore = new SemaphoreSlim(Environment.ProcessorCount * someMagicNumber);
foreach (var result in StreamResults()) 
{
  semaphore.Wait();
  var task = DoWorkAsync(result).ContinueWith(t => semaphore.Release());
  ...
}

This is to avoid bringing too many results into memory and the program being unable to cope (generally evidenced via an OutOfMemoryException). Though the code works and is reasonably performant, it still feels ungainly. Notably the someMagicNumber multiplier, which although tuned via profiling, may not be as optimal as it could be and isn't resilient to changes to the implementation of DoWorkAsync.

In the same way that thread pooling can overcome the obstacle of scheduling many things for execution, I would like something that can overcome the obstacle of scheduling many things to be loaded into memory based on the resources that are available.

Since it is deterministically impossible to decide whether an OutOfMemoryException will occur, I appreciate that what I'm looking for may only be achievable via statistical means or even not at all, but I hope that I'm missing something.

4

1 回答 1

1

在这里我想说你可能想多了这个问题。超调的后果是相当高的(程序崩溃)。太低的后果是程序可能会变慢。只要您仍然有一些超出最小值的缓冲区,进一步增加缓冲区通常几乎没有影响,除非管道中该任务的处理时间非常不稳定。

如果您的缓冲区不断填满,则通常意味着管道中它之前的任务比它后面的任务执行得快很多,因此即使没有一个相当小的缓冲区,它也可能始终确保它后面的任务有一些工作. 获得缓冲区 90% 的好处所需的缓冲区大小通常会非常小(可能只有几十个项目),而需要获得 OOM 错误的一方则要高出 6 个以上的数量级。只要您介于这两个数字之间(这是一个相当大的范围),您就可以了。

只需运行您的静态测试,选择一个静态数字,也许会为“以防万一”添加几个额外的百分比,您应该会很好。最多,我会将一些幻数移动到配置文件中,以便在输入数据或机器规格发生根本变化的情况下无需重新编译即可更改它们。

于 2012-06-22T15:45:31.393 回答