我正在尝试从 URL 获取 HTML,以便可以使用 Boilerpipe 将其剥离。但是,我不断收到异常。我正在使用 NewsAPI 来获取我的 URL。这是相关的代码片段:
foreach (var article in articlesResponse.Articles)
{
string html;
string url = article.Url;
using (WebClient client = new WebClient())
{
html = client.DownloadString(url);
}
string text = CommonExtractors.DefaultExtractor.GetText(html);
System.IO.File.AppendAllText(fileName, "Title: " + article.Title + "\n");
System.IO.File.AppendAllText(fileName, "Author: " + article.Author + "\n");
System.IO.File.AppendAllText(fileName, "Description: " + article.Description + "\n");
System.IO.File.AppendAllText(fileName, "URL: " + article.Url + "\n");
System.IO.File.AppendAllText(fileName, "Published at: " + article.PublishedAt + "\n");
System.IO.File.AppendAllText(fileName, "Text: " + text + "\n\n");
}
这是异常的详细信息:
System.Net.WebException
HResult=0x80131509
Message=The remote server returned an error: (404) Not Found.
Source=System
StackTrace:
at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
at System.Net.WebClient.DownloadString(Uri address)
at System.Net.WebClient.DownloadString(String address)
at newsapi_take_two.Program.Main(String[] args) in ...\source\repos\newsapi console\newsapi take two\Program.cs:line 53