我想从单个网站页面(使用 XML HTTP 请求)中抓取一个网站(提取产品价格)。但在运行此脚本之前,我需要先选择正确的商店(保存在浏览器 cookie 变量中,或者尽可能以任何其他方式/请求包含),因为不同商店的价格不同。
我已经创建了一个工作代码,但它需要很长时间才能运行,所以我认为必须有更快和更清洁的 :) 方式。我还需要包含应用程序以等待网站遵循这些步骤。
我当前的 vba 代码:
- 运行 HTTP IE 请求以打开网站,并在多次单击中选择所需的商店并将其保存在 cookie 中(就像网站用户应该做的那样)
- 接下来使用另一个 HTTP IE 请求请求产品页面并提取数据。我发现不能使用 XML HTTP 请求,因为它不会使用正确存储的 cookie 值,显示正确的价格。
- 我追求的价格(在下面的示例中)是 E 1,39 而不是 E 1,48(当没有使用 cookie 值并且没有选择商店时)。
- cookie 值保存在 cookie“www.jumbo.com/cookie/HomeStore”中,内容包含预先知道的存储标签,如果可能的话,可以在请求中硬编码。
选择正确的商店(并将其保存在浏览器 cookie 中)
Sub SetStore()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLSearchbox As MSHTML.IHTMLElement
Dim HTMLSearchboxes As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLSearchButton As MSHTML.IHTMLElement
Dim HTMLSearchButtons As MSHTML.IHTMLElementCollection
Dim HTMLStoreID As MSHTML.IHTMLElement
Dim HTMLStoreIDs As MSHTML.IHTMLElementCollection
Dim HTMLSaveStore As MSHTML.IHTMLElement
Dim HTMLSaveStores As MSHTML.IHTMLElementCollection
'set on False to hide IE screen
IE.Visible = True
'navigate to url with limited content
IE.navigate "https://www.jumbo.com/content/algemene-voorwaarden/"
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
Set HTMLButtons = HTMLDoc.getElementsByTagName("button")
For Each HTMLButton In HTMLButtons
If HTMLButton.getAttribute("data-jum-action") = "openHomeStoreFinder" Then
HTMLButton.Click
Exit For
End If
Next HTMLButton
Application.Wait Now + #12:00:02 AM#
Set HTMLSearchboxes = HTMLDoc.getElementsByTagName("input")
For Each HTMLSearchbox In HTMLSearchboxes
If HTMLSearchbox.getAttribute("id") = "searchTerm__DkKYx4XylsAAAFJktpb2Guy" Then
'input field store name/location to show search results
HTMLSearchbox.Value = "Oosterhout"
Application.Wait Now + #12:00:03 AM#
HTMLSearchbox.Click
Exit For
End If
Next HTMLSearchbox
Set HTMLSearchButtons = HTMLDoc.getElementsByTagName("button")
For Each HTMLSearchButton In HTMLSearchButtons
If HTMLSearchButton.getAttribute("data-jum-filter") = "search" Then
HTMLSearchButton.Click
Exit For
End If
Next HTMLSearchButton
Application.Wait Now + #12:00:05 AM#
Set HTMLStoreIDs = HTMLDoc.getElementsByTagName("li")
For Each HTMLStoreID In HTMLStoreIDs
'oosterhout = YC8KYx4XB88AAAFIDcIYwKxJ
'nieuwegein = 84IKYx4XziUAAAFInSYYwKrH
'vaassen = JYYKYx4XC1oAAAFItvcYwKxJ
'brielle = OG8KYx4XP4wAAAFIlsEYwKxK
If HTMLStoreID.getAttribute("data-jum-store-id") = "YC8KYx4XB88AAAFIDcIYwKxJ" Then
HTMLStoreID.Click
Application.Wait Now + #12:00:03 AM#
Exit For
End If
Next HTMLStoreID
Set HTMLSaveStores = HTMLDoc.getElementsByTagName("button")
For Each HTMLSaveStore In HTMLSaveStores
If HTMLSaveStore.getAttribute("data-jum-action") = "saveHomeStore" Then
HTMLSaveStore.Click
Exit For
End If
Next HTMLSaveStore
'IE.Quit
End Sub
从产品页面提取数据(IE HTTP 请求,使用 cookie 存储值)
Sub GetJumboPriceIE()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String
Dim SKU_tag As String, SKU_url As String
SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"
IE.Visible = False
IE.navigate SKU_url
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
IE.Quit
Set JumInputs = HTMLDoc.getElementsByTagName("input")
Price_In_Cents_Tag = "PriceInCents_" & SKU_tag
Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)
JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice
End Sub
上面的代码正在运行,但想使用如下所示的 XML HTTP 请求代码(但使用正确的存储)。打印 1,39 的价格。
从产品页面提取数据(使用 XML HTTP 请求),但未使用 cookie 值
Sub GetJumboPriceXML()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String
Dim SKU_tag As String, SKU_url As String
SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"
XMLReq.Open "GET", SKU_url, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set JumInputs = HTMLDoc.getElementsByTagName("input")
Price_In_Cents_Tag = "PriceInCents_" & SKU_tag
Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)
JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice
End Sub
此代码未使用正确的商店并输出我不想要的价格(打印价格 1,48)。
总结一下:
当未选择任何商店(未设置 cookie)时,以下 URL 现在给出的价格为 1.48 欧元。
我希望 VB 脚本将商店设置为“Jumbo Oosterhout Nieuwe Bouwlingstraat”,然后抓取预定义的列表操作产品 URL 并提取价格(上面的 URL 给出 1.39 欧元)。
然后将商店设置为不同的本地商店“Jumbo Brielle Thoelaverweg”并抓取相同的产品 URL 列表。上面的 URL 给出了 1.48 欧元。
您可以通过单击页面右上角的位置图钉图标来选择不同的商店。
非常感谢你的帮助