我有一个用 VB.Net 编写的 winform 应用程序,它需要下载包含 PubMed(医学期刊)文章数据的 XML 文件。我一次请求 500 篇文章的数据,因为我需要对其进行流式传输,并且我想避免加载超出可用内存的文件。在返回的文件中,每篇文章的数据都包含在<PubmedArticle>
元素中:
<PubmedArticleSet>
<PubmedArticle>
... (Article Data) ...
</PubmedArticle>
<PubmedArticle>
... (Article Data) ...
</PubmedArticle>
</PubmedArticleSet>
我的代码看起来像这样(实际代码在每次迭代 500 个 Pubmed ID 的循环中执行下面的代码):
Dim pubmedIDs As String() = {"20816578", "20815951"}
Dim xmlUrl As String = String.Format("{0}{1}{2}", "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=", String.Join(",", pubmedIDs), "&retmode=xml&rettype=abstract")
Dim request as HttpWebRequest = DirectCast(WebRequest.Create(xmlUrl), HttpWebRequest)
Try
Using response As WebResponse = request.GetResponse()
Using responseStream As Stream = response.GetResponseStream()
Dim xDoc As XDocument = XDocument.Load(responseStream)
'Break up the requested file into one file per article and save them to a cache directory
'Update a progress bar as files are cached
End Using
End Using
Catch ex As WebException
'Handle HTTP errors by capturing Pubmed IDs of failed request to allow user to retry later
'Update progress bar despite failed request to let user know when the process is finished
End Try
这一切都很好,但是在典型的运行中,我需要收集 20K+ 文件的文章数据,这大约需要 10 分钟。有人可以就如何对请求进行多线程处理给我建议吗?