1

我想从单个网站页面(使用 XML HTTP 请求)中抓取一个网站(提取产品价格)。但在运行此脚本之前,我需要先选择正确的商店(保存在浏览器 cookie 变量中,或者尽可能以任何其他方式/请求包含),因为不同商店的价格不同。

我已经创建了一个工作代码,但它需要很长时间才能运行,所以我认为必须有更快和更清洁的 :) 方式。我还需要包含应用程序以等待网站遵循这些步骤。

我当前的 vba 代码:

  • 运行 HTTP IE 请求以打开网站,并在多次单击中选择所需的商店并将其保存在 cookie 中(就像网站用户应该做的那样)
  • 接下来使用另一个 HTTP IE 请求请求产品页面并提取数据。我发现不能使用 XML HTTP 请求,因为它不会使用正确存储的 cookie 值,显示正确的价格。
  • 我追求的价格(在下面的示例中)是 E 1,39 而不是 E 1,48(当没有使用 cookie 值并且没有选择商店时)。
  • cookie 值保存在 cookie“www.jumbo.com/cookie/HomeStore”中,内容包含预先知道的存储标签,如果可能的话,可以在请求中硬编码。

选择正确的商店(并将其保存在浏览器 cookie 中)

   Sub SetStore()

    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument

    Dim HTMLSearchbox As MSHTML.IHTMLElement
    Dim HTMLSearchboxes As MSHTML.IHTMLElementCollection
    Dim HTMLButton As MSHTML.IHTMLElement
    Dim HTMLButtons As MSHTML.IHTMLElementCollection
    Dim HTMLSearchButton As MSHTML.IHTMLElement
    Dim HTMLSearchButtons As MSHTML.IHTMLElementCollection
    Dim HTMLStoreID As MSHTML.IHTMLElement
    Dim HTMLStoreIDs As MSHTML.IHTMLElementCollection
    Dim HTMLSaveStore As MSHTML.IHTMLElement
    Dim HTMLSaveStores As MSHTML.IHTMLElementCollection


   'set on False to hide IE screen
    IE.Visible = True

    'navigate to url with limited content
    IE.navigate "https://www.jumbo.com/content/algemene-voorwaarden/"

    Do While IE.readyState <> READYSTATE_COMPLETE

    Loop
    Set HTMLDoc = IE.document

    Set HTMLButtons = HTMLDoc.getElementsByTagName("button")


    For Each HTMLButton In HTMLButtons

        If HTMLButton.getAttribute("data-jum-action") = "openHomeStoreFinder" Then
           HTMLButton.Click
            Exit For
        End If

     Next HTMLButton


       Application.Wait Now + #12:00:02 AM#

    Set HTMLSearchboxes = HTMLDoc.getElementsByTagName("input")

    For Each HTMLSearchbox In HTMLSearchboxes

     If HTMLSearchbox.getAttribute("id") = "searchTerm__DkKYx4XylsAAAFJktpb2Guy" Then


    'input field store name/location to show search results
    HTMLSearchbox.Value = "Oosterhout"

           Application.Wait Now + #12:00:03 AM#

           HTMLSearchbox.Click

            Exit For
        End If

     Next HTMLSearchbox

     Set HTMLSearchButtons = HTMLDoc.getElementsByTagName("button")

    For Each HTMLSearchButton In HTMLSearchButtons

        If HTMLSearchButton.getAttribute("data-jum-filter") = "search" Then
            HTMLSearchButton.Click

            Exit For
        End If

    Next HTMLSearchButton

    Application.Wait Now + #12:00:05 AM#

    Set HTMLStoreIDs = HTMLDoc.getElementsByTagName("li")

    For Each HTMLStoreID In HTMLStoreIDs


  'oosterhout = YC8KYx4XB88AAAFIDcIYwKxJ
  'nieuwegein = 84IKYx4XziUAAAFInSYYwKrH
  'vaassen = JYYKYx4XC1oAAAFItvcYwKxJ
  'brielle = OG8KYx4XP4wAAAFIlsEYwKxK

     If HTMLStoreID.getAttribute("data-jum-store-id") = "YC8KYx4XB88AAAFIDcIYwKxJ" Then


     HTMLStoreID.Click

      Application.Wait Now + #12:00:03 AM#

          Exit For
      End If


  Next HTMLStoreID

  Set HTMLSaveStores = HTMLDoc.getElementsByTagName("button")


  For Each HTMLSaveStore In HTMLSaveStores

        If HTMLSaveStore.getAttribute("data-jum-action") = "saveHomeStore" Then
            HTMLSaveStore.Click


            Exit For
       End If

    Next HTMLSaveStore


   'IE.Quit

End Sub

从产品页面提取数据(IE HTTP 请求,使用 cookie 存储值)

Sub GetJumboPriceIE()


Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String

Dim SKU_tag As String, SKU_url As String

SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"

IE.Visible = False
   IE.navigate SKU_url



    Do While IE.readyState <> READYSTATE_COMPLETE
    Loop


    Set HTMLDoc = IE.document

    IE.Quit


Set JumInputs = HTMLDoc.getElementsByTagName("input")

Price_In_Cents_Tag = "PriceInCents_" & SKU_tag

Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)


JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice


End Sub

上面的代码正在运行,但想使用如下所示的 XML HTTP 请求代码(但使用正确的存储)。打印 1,39 的价格。

从产品页面提取数据(使用 XML HTTP 请求),但未使用 cookie 值

Sub GetJumboPriceXML()

Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String

Dim SKU_tag As String, SKU_url As String

SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"


XMLReq.Open "GET", SKU_url, False
XMLReq.send

If XMLReq.Status <> 200 Then

MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
 Exit Sub
 End If

  HTMLDoc.body.innerHTML = XMLReq.responseText

Set JumInputs = HTMLDoc.getElementsByTagName("input")


Price_In_Cents_Tag = "PriceInCents_" & SKU_tag

Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)

JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice



End Sub

此代码未使用正确的商店并输出我不想要的价格(打印价格 1,48)。


总结一下:

当未选择任何商店(未设置 cookie)时,以下 URL 现在给出的价格为 1.48 欧元。

我希望 VB 脚本将商店设置为“Jumbo Oosterhout Nieuwe Bouwlingstraat”,然后抓取预定义的列表操作产品 URL 并提取价格(上面的 URL 给出 1.39 欧元)。

然后将商店设置为不同的本地商店“Jumbo Brielle Thoelaverweg”并抓取相同的产品 URL 列表。上面的 URL 给出了 1.48 欧元。

您可以通过单击页面右上角的位置图钉图标来选择不同的商店。

非常感谢你的帮助

4

0 回答 0