c# - 如何正确地从博客文章中提取内容？

Question

我正在尝试从这样的博客文章中提取内容：

static void GetBlogData (string blogPostUrl)
{
    string blogPostContent = null;

    WebClient client = new WebClient ();
    //client.Headers.Add (HttpRequestHeader.Referer, "http://www.stackoverflow.com");

    TextWriter writer = new StreamWriter ("/home/nanda/projects/mono/common/article");

    try
    {
        blogPostContent = client.DownloadString (blogPostUrl);
    }

    catch (Exception ex)
    {
        Term.PrintLn ("Unable to download\n{0}", ex.Message);
    }

    if (blogPostContent != null) 
    {
        writer.WriteLine (blogPostContent);
    } 

    else
    {
        Term.PrintLn ("No content found");
    }
}

我知道这是一种过于简单的方法，但我想知道为什么我无法从某些 URL 中提取内容，比如它们有块或其他东西。如何检测网站/博客是否阻止我下载其内容？

score 2 · Accepted Answer

A website cannot block you from downloading its content without blocking the site's consultation from a browser.

If your download fails, it means either:

a) your url is wrong

b) the website needs some form of identification and your request lacks something (probably a cookie)

c# - 如何正确地从博客文章中提取内容？

1 回答 1

Related

Reference