138
// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };

// now let's send HTTP requests to each of these URLs in parallel
urls.AsParallel().ForAll(async (url) => {
    var client = new HttpClient();
    var html = await client.GetStringAsync(url);
});

这是问题所在,它同时启动了 1000 多个 Web 请求。有没有一种简单的方法来限制这些异步 http 请求的并发量?因此,在任何给定时间下载的网页不超过 20 个。如何以最有效的方式做到这一点?

4

11 回答 11

205

您绝对可以使用 .NET 4.5 Beta 在最新版本的 async for .NET 中执行此操作。'usr' 的上一篇文章指出了 Stephen Toub 写的一篇好文章,但鲜为人知的消息是异步信号量实际上已进入 .NET 4.5 的 Beta 版本

如果你看看我们心爱的SemaphoreSlim类(你应该使用它,因为它比原来的性能更好Semaphore),它现在拥有WaitAsync(...)一系列重载,以及所有预期的参数 - 超时间隔、取消令牌、所有你常用的调度朋友: )

Stephen 还写了一篇关于 .NET 4.5 Beta 版新特性的最新博客文章,请参阅What's New for Parallelism in .NET 4.5 Beta

最后,这里有一些关于如何使用 SemaphoreSlim 进行异步方法限制的示例代码:

public async Task MyOuterMethod()
{
    // let's say there is a list of 1000+ URLs
    var urls = { "http://google.com", "http://yahoo.com", ... };

    // now let's send HTTP requests to each of these URLs in parallel
    var allTasks = new List<Task>();
    var throttler = new SemaphoreSlim(initialCount: 20);
    foreach (var url in urls)
    {
        // do an async wait until we can schedule again
        await throttler.WaitAsync();

        // using Task.Run(...) to run the lambda in its own parallel
        // flow on the threadpool
        allTasks.Add(
            Task.Run(async () =>
            {
                try
                {
                    var client = new HttpClient();
                    var html = await client.GetStringAsync(url);
                }
                finally
                {
                    throttler.Release();
                }
            }));
    }

    // won't get here until all urls have been put into tasks
    await Task.WhenAll(allTasks);

    // won't get here until all tasks have completed in some way
    // (either success or exception)
}

最后但可能值得一提的是使用基于 TPL 的调度的解决方案。您可以在 TPL 上创建尚未启动的委托绑定任务,并允许自定义任务调度程序限制并发。事实上,这里有一个 MSDN 示例:

另请参阅任务调度程序

于 2012-05-30T06:01:57.147 回答
23

如果您有一个 IEnumerable(即 URL 的字符串)并且您希望同时对其中的每一个执行 I/O 绑定操作(即发出异步 http 请求),并且您还可以选择设置最大并发数实时 I/O 请求,您可以这样做。这种方式你不使用线程池等,该方法使用信号量来控制最大并发 I/O 请求,类似于滑动窗口模式,一个请求完成,离开信号量,下一个进入。

用法:

await ForEachAsync(urlStrings, YourAsyncFunc, optionalMaxDegreeOfConcurrency);
public static Task ForEachAsync<TIn>(
        IEnumerable<TIn> inputEnumerable,
        Func<TIn, Task> asyncProcessor,
        int? maxDegreeOfParallelism = null)
    {
        int maxAsyncThreadCount = maxDegreeOfParallelism ?? DefaultMaxDegreeOfParallelism;
        SemaphoreSlim throttler = new SemaphoreSlim(maxAsyncThreadCount, maxAsyncThreadCount);

        IEnumerable<Task> tasks = inputEnumerable.Select(async input =>
        {
            await throttler.WaitAsync().ConfigureAwait(false);
            try
            {
                await asyncProcessor(input).ConfigureAwait(false);
            }
            finally
            {
                throttler.Release();
            }
        });

        return Task.WhenAll(tasks);
    }
于 2016-06-01T12:52:56.480 回答
9

有很多陷阱,在错误情况下直接使用信号量可能会很棘手,所以我建议使用AsyncEnumerator NuGet 包而不是重新发明轮子:

// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };

// now let's send HTTP requests to each of these URLs in parallel
await urls.ParallelForEachAsync(async (url) => {
    var client = new HttpClient();
    var html = await client.GetStringAsync(url);
}, maxDegreeOfParalellism: 20);
于 2016-08-26T21:30:46.263 回答
6

不幸的是,.NET Framework 缺少用于编排并行异步任务的最重要的组合器。没有内置这样的东西。

看看最受尊敬的 Stephen Toub 构建的AsyncSemaphore类。您想要的称为信号量,并且您需要它的异步版本。

于 2012-05-29T21:49:04.917 回答
4

SemaphoreSlim 在这里非常有用。这是我创建的扩展方法。

    /// <summary>
    /// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
    /// </summary>
    /// <typeparam name="T">Type of IEnumerable</typeparam>
    /// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
    /// <param name="action">an async <see cref="Action" /> to execute</param>
    /// <param name="maxActionsToRunInParallel">Optional, max numbers of the actions to run in parallel,
    /// Must be grater than 0</param>
    /// <returns>A Task representing an async operation</returns>
    /// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
    public static async Task ForEachAsyncConcurrent<T>(
        this IEnumerable<T> enumerable,
        Func<T, Task> action,
        int? maxActionsToRunInParallel = null)
    {
        if (maxActionsToRunInParallel.HasValue)
        {
            using (var semaphoreSlim = new SemaphoreSlim(
                maxActionsToRunInParallel.Value, maxActionsToRunInParallel.Value))
            {
                var tasksWithThrottler = new List<Task>();

                foreach (var item in enumerable)
                {
                    // Increment the number of currently running tasks and wait if they are more than limit.
                    await semaphoreSlim.WaitAsync();

                    tasksWithThrottler.Add(Task.Run(async () =>
                    {
                        await action(item).ContinueWith(res =>
                        {
                            // action is completed, so decrement the number of currently running tasks
                            semaphoreSlim.Release();
                        });
                    }));
                }

                // Wait for all of the provided tasks to complete.
                await Task.WhenAll(tasksWithThrottler.ToArray());
            }
        }
        else
        {
            await Task.WhenAll(enumerable.Select(item => action(item)));
        }
    }

样品用法:

await enumerable.ForEachAsyncConcurrent(
    async item =>
    {
        await SomeAsyncMethod(item);
    },
    5);
于 2018-05-09T13:04:10.800 回答
1

.NET 6发布后(2021 年 11 月),限制并发异步 I/O 操作量的推荐方式是Parallel.ForEachAsyncAPI,带有MaxDegreeOfParallelism配置。以下是如何在实践中使用它:

// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", /*...*/ };
var client = new HttpClient();
var options = new ParallelOptions() { MaxDegreeOfParallelism = 20 };

// now let's send HTTP requests to each of these URLs in parallel
await Parallel.ForEachAsync(urls, options, async (url, cancellationToken) =>
{
    var html = await client.GetStringAsync(url, cancellationToken);
});

在上面的示例中,Parallel.ForEachAsync任务是异步等待的。Wait如果需要,您也可以同步它,这将阻塞当前线程,直到所有异步操作完成。同步Wait的优点是如果出现错误,所有异常都会被传播。相反,await操作符通过设计仅传播第一个异常。如果这是一个问题,您可以在此处找到解决方案。

(注意:也传播结果的扩展方法的惯用实现ForEachAsync,可以在此答案的第 4 版中找到)

于 2020-10-21T01:49:55.487 回答
0

尽管 1000 个任务可能会很快排队,但并行任务库只能处理等于机器中 CPU 内核数量的并发任务。这意味着如果您有一台四核机器,那么在给定时间只会执行 4 个任务(除非您降低 MaxDegreeOfParallelism)。

于 2012-05-29T21:32:14.937 回答
-1

这不是好的做法,因为它会更改全局变量。它也不是异步的通用解决方案。但是对于 HttpClient 的所有实例来说都很容易,如果这就是你所追求的。你可以简单地尝试:

System.Net.ServicePointManager.DefaultConnectionLimit = 20;
于 2019-03-29T08:36:08.230 回答
-2

应该使用并行计算来加速 CPU 密集型操作。这里我们讨论的是 I/O 绑定操作。你的实现应该是纯粹的 async,除非你在你的多核 CPU 上压倒了繁忙的单核。

编辑 我喜欢 usr 提出的在此处使用“异步信号量”的建议。

于 2012-05-29T21:34:29.780 回答
-2

本质上,您将要为要访问的每个 URL 创建一个操作或任务,将它们放在一个列表中,然后处理该列表,从而限制可以并行处理的数量。

我的博客文章展示了如何使用 Tasks 和 Actions 执行此操作,并提供了一个示例项目,您可以下载并运行以查看两者的实际效果。

有行动

如果使用 Actions,您可以使用内置的 .Net Parallel.Invoke 函数。在这里,我们将其限制为最多并行运行 20 个线程。

var listOfActions = new List<Action>();
foreach (var url in urls)
{
    var localUrl = url;
    // Note that we create the Task here, but do not start it.
    listOfTasks.Add(new Task(() => CallUrl(localUrl)));
}

var options = new ParallelOptions {MaxDegreeOfParallelism = 20};
Parallel.Invoke(options, listOfActions.ToArray());

有任务

Tasks 没有内置功能。但是,您可以使用我在博客上提供的那个。

    /// <summary>
    /// Starts the given tasks and waits for them to complete. This will run, at most, the specified number of tasks in parallel.
    /// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
    /// </summary>
    /// <param name="tasksToRun">The tasks to run.</param>
    /// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
    /// <param name="cancellationToken">The cancellation token.</param>
    public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, CancellationToken cancellationToken = new CancellationToken())
    {
        await StartAndWaitAllThrottledAsync(tasksToRun, maxTasksToRunInParallel, -1, cancellationToken);
    }

    /// <summary>
    /// Starts the given tasks and waits for them to complete. This will run the specified number of tasks in parallel.
    /// <para>NOTE: If a timeout is reached before the Task completes, another Task may be started, potentially running more than the specified maximum allowed.</para>
    /// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
    /// </summary>
    /// <param name="tasksToRun">The tasks to run.</param>
    /// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
    /// <param name="timeoutInMilliseconds">The maximum milliseconds we should allow the max tasks to run in parallel before allowing another task to start. Specify -1 to wait indefinitely.</param>
    /// <param name="cancellationToken">The cancellation token.</param>
    public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, int timeoutInMilliseconds, CancellationToken cancellationToken = new CancellationToken())
    {
        // Convert to a list of tasks so that we don't enumerate over it multiple times needlessly.
        var tasks = tasksToRun.ToList();

        using (var throttler = new SemaphoreSlim(maxTasksToRunInParallel))
        {
            var postTaskTasks = new List<Task>();

            // Have each task notify the throttler when it completes so that it decrements the number of tasks currently running.
            tasks.ForEach(t => postTaskTasks.Add(t.ContinueWith(tsk => throttler.Release())));

            // Start running each task.
            foreach (var task in tasks)
            {
                // Increment the number of tasks currently running and wait if too many are running.
                await throttler.WaitAsync(timeoutInMilliseconds, cancellationToken);

                cancellationToken.ThrowIfCancellationRequested();
                task.Start();
            }

            // Wait for all of the provided tasks to complete.
            // We wait on the list of "post" tasks instead of the original tasks, otherwise there is a potential race condition where the throttler's using block is exited before some Tasks have had their "post" action completed, which references the throttler, resulting in an exception due to accessing a disposed object.
            await Task.WhenAll(postTaskTasks.ToArray());
        }
    }

然后创建任务列表并调用函数让它们运行,一次最多同时运行 20 个,你可以这样做:

var listOfTasks = new List<Task>();
foreach (var url in urls)
{
    var localUrl = url;
    // Note that we create the Task here, but do not start it.
    listOfTasks.Add(new Task(async () => await CallUrl(localUrl)));
}
await Tasks.StartAndWaitAllThrottledAsync(listOfTasks, 20);
于 2016-04-29T08:34:32.407 回答
-3

使用MaxDegreeOfParallelism,这是您可以在 中指定的选项Parallel.ForEach()

var options = new ParallelOptions { MaxDegreeOfParallelism = 20 };

Parallel.ForEach(urls, options,
    url =>
        {
            var client = new HttpClient();
            var html = client.GetStringAsync(url);
            // do stuff with html
        });
于 2012-05-29T21:43:57.120 回答