0

我有以下代码使用 HtmlAgilityPack 为许多网站拉回 html 代码。除了 asos.com 之外,一切似乎都运行良好。运行 url 时,它会返回随机字符 (‹\b\0\0\0\0\0\0UÍ „ï&¾CãÁ¢ø›\bãhìÁ3-« Ziý}z'š/»ómf³Ü`]In@iÉÑbr [œ¡Ä¬v7Ðœ¶7N[GáôSv;Ü°?[†.ã*3Ž¢G×ù6OƒäwPŒõH\rÙ¸\vzìmèÎ;M›4q_K¨Ð)

    HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
    doc.OptionReadEncoding = false;
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
    request.Timeout = 10000;
    request.ReadWriteTimeout = 32000;
    request.UserAgent = "TEST";
    request.Method = "GET";
    request.Accept = "text/html";
    request.AllowAutoRedirect = false;
    request.CookieContainer = new CookieContainer();
    StreamReader reader = new StreamReader(request.GetResponse().GetResponseStream(), Encoding.Default); //put your encoding            
    doc.Load(reader);

    string html = doc.DocumentNode.OuterHtml;

我已经通过 Fiddler 运行了 url,但是似乎看不到任何暗示应该有问题的东西。有什么想法我哪里出错了吗?

在此处查看 fiddler 的标题图片:http: //i.stack.imgur.com/2LRFY.png

4

1 回答 1

1

这与 Html Agility Pack 无关,因为您已设置AllowAutoRedirect为 false。删除它,它会工作。该站点显然进行了重定向,如果您想要最终的 HTML 文本,则需要遵循它。

请注意,Html Agility Pack 有一个实用程序HtmlWeb类,可以直接将文件下载为HmlDocument

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(@"http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
于 2013-04-10T06:08:49.607 回答