我正在尝试抓取一个用 php 编写的网站,以从特定表中提取一些信息。这是场景。
在登录页面上,有一个表单可以从用户那里获取查询并基于该搜索结果。如果我忽略这些字段并单击“提交”,它将产生整个结果(这是我感兴趣的)。在我不知道 HTTPWebRequest 类之前,我只是将 URL 传递给 HtmlAgilityPack 库中的 Htmlweb.load(URL) 方法,显然不是要走的路。
然后我搜索了 HTTPWebRequest 并找到了一个类似这样的示例
Dim cookies As New CookieContainer
Dim postData As String = "postData obtained using live httpheaders pluging in firefox"
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)
Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create("URL"), HttpWebRequest)
postRequest.Method = "POST"
postRequest.KeepAlive = True
postRequest.CookieContainer = cookies
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.ContentLength = byteData.Length
postRequest.Referer = "Referer Page"
postRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 Firefox/4.0 (.NET CLR 3.5.30729)"
Dim postreqstream As Stream = postRequest.GetRequestStream()
postreqstream.Write(byteData, 0, byteData.Length)
postreqstream.Close()
Dim postresponse As HttpWebResponse
postresponse = DirectCast(postRequest.GetResponse(), HttpWebResponse)
cookies.Add(postresponse.Cookies)
Dim postreqreader As New StreamReader(postresponse.GetResponseStream())
Dim thepage As String = postreqreader.ReadToEnd
现在,当我以 vb 形式将 page 变量输出到浏览器时,我可以看到我想要的页面(包含表格)。此时我只是像这样将该页面的 URL 传递给 htmlagilitypack
Dim web As New HtmlAgilityPack.HtmlWeb()
Dim htmlDoc As HtmlAgilityPack.HtmlDocument = web.Load("URL")
Dim tabletag As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//table")
Dim tablenode As HtmlNode = htmlDoc.DocumentNode.SelectSingleNode("//table[@summary='List of services']")
If Not tabletag Is Nothing Then
Console.WriteLine("YES")
End If
但是 tabletag 变量什么都不是。我想知道我哪里出错了?还有是否可以直接从 httpwebrespone 获取 URL,以便我可以传递给 web.load 方法?
谢谢你