我正在做一个项目,我需要加载多个(100 多个)远程 RSS 提要,解析它们并查询一些关键字。显然这个过程很耗时,我正在寻找实现这一点的最佳方法。
我当前的实现同步加载提要,因为 TPL 的异步实现失败了,因为在这个过程中创建了很多任务,最后它抛出了一个异常。
加载远程提要的异步部分如下所示:
/// <summary>
/// Loads the specified URL.
/// </summary>
/// <param name="url">The URL.</param>
/// <returns></returns>
/// <exception cref="ScanException">Unable to download rss feed from the specified url. Check the inner exception for more details.</exception>
protected async Task<XDocument> Load(string url)
{
XDocument document = null;
try
{
using (var client = new HttpClient())
{
HttpResponseMessage response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
string content = await response.Content.ReadAsStringAsync();
document = XDocument.Parse(content);
}
}
}
catch (Exception ex)
{
throw new ScanException(url, "Unable to download rss feed from the specified url. Check the inner exception for more details.", ex);
}
return document;
}
我希望你们能指出我正确的方向,这样我就可以让它正常工作(性能方面)。
最后一个问题是:加载多个远程 RSS 提要的最佳方式是什么?
测试代码
/// <summary>
/// Reads the feeds by batch async.
/// </summary>
/// <param name="feeds">The feeds.</param>
public void ReadFeedsByBatchAsync(string[] feeds, TorrentStorage storage, int batchSize = 8)
{
var tasks = new List<Task>(batchSize);
var feedsLeft = feeds.Length;
foreach (string feed in feeds)
{
var readFeedTask = this.client.GetStringAsync(feed);
if (readFeedTask.Status == TaskStatus.RanToCompletion)
{
XDocument document = XDocument.Parse(readFeedTask.Result);
var torrents = ProcessXmlDocument(document);
storage.Store(torrents);
}
tasks.Add(readFeedTask);
--feedsLeft;
if (tasks.Count == tasks.Capacity || feedsLeft == 0)
{
var batchTasks = tasks.ToArray();
tasks.Clear();
try
{
Task.WaitAll(batchTasks);
}
catch (Exception)
{
throw;
}
}
}
}