1

我正在尝试为我的大学创建一个包含课程表的地铁应用程序。我使用 HAP+Fizzler 解析页面并获取数据。

计划链接给我@Too many automatic redirections@ 错误。我发现 CookieContainer 可以帮助我,但不知道如何实现它。

        CookieContainer cc = new CookieContainer();
        request.CookieContainer = cc;

我的代码:

            public static HttpWebRequest request;
    public string Url = "http://cist.kture.kharkov.ua/ias/app/tt/f?p=778:201:9421608126858:::201:P201_FIRST_DATE,P201_LAST_DATE,P201_GROUP,P201_POTOK:01.09.2012,31.01.2013,2423447,0:";
    public SampleDataSource()
    {

        HtmlDocument html = new HtmlDocument();
        request = (HttpWebRequest)WebRequest.Create(Url);
        request.Proxy = null;
        request.UseDefaultCredentials = true;
        CookieContainer cc = new CookieContainer();
        request.CookieContainer = cc;
        html.LoadHtml(request.RequestUri.ToString());
        var page = html.DocumentNode;

String ITEM_CONTENT = null;
foreach (var item in page.QuerySelectorAll(".MainTT")) 
{
    ITEM_CONTENT = item.InnerHtml;
}
      }

使用 CookieContainer 我没有收到错误,但 DocumentNode.InnerHtml 出于某种原因获取了我的 URI 的值,而不是页面 html。

4

3 回答 3

1

你只需要改变一行。

代替

 html.LoadHtml(request.RequestUri.ToString());

 html.LoadHtml(new StreamReader(request.GetResponse().GetResponseStream()).ReadToEnd());

编辑

首先将您的方法标记为async

request.CookieContainer = cc;
var resp = await request.GetResponseAsync();
html.LoadHtml(new StreamReader(resp.GetResponseStream()).ReadToEnd());
于 2012-11-28T18:12:21.373 回答
0

如果您想下载网页代码,请尝试使用此方法(通过使用HttpClient):

public async Task<string> DownloadHtmlCode(string url)
    {
        HttpClientHandler handler = new HttpClientHandler { UseDefaultCredentials = true, AllowAutoRedirect = true };
        HttpClient client = new HttpClient(handler);
        HttpResponseMessage response = await client.GetAsync(url);                  
        response.EnsureSuccessStatusCode();
        string responseBody = await response.Content.ReadAsStringAsync();
        return responseBody;
    }
于 2013-06-24T16:24:17.560 回答
0

如果要解析下载的 htmlcode,可以使用 Regex 或 LINQ。我有一些使用 LINQ 解析 html 代码的示例,但在您应该使用HtmlAgilityPack库将代码加载到 HtmlDocument 之前。然后你可以通过这种方式加载:html.LoadHtml(temphtml); 当你这样做时,你可以解析你的 HtmlDocument:

//This is for img links parse-example:
IEnumerable<HtmlNode> imghrefNodes = html.DocumentNode.Descendants().Where(n => n.Name == "img");
foreach (HtmlNode img in imghrefNodes)
{
   HtmlAttribute att = img.Attributes["src"];
   //in att.Value you can find your img url
   //Here you can do everything what you want with all img links by editing att.Value
}
于 2013-06-24T17:07:19.557 回答