3

我有兴趣从中提取数据的网页有一个包含多个搜索字段的表格。我可以在这些字段中的任何一个字段中输入数据,然后单击表格底部的搜索按钮,然后根据我想要搜索的信息查看结果。

我有多个要搜索的数字(大约 300 个),而不是单独搜索每个数字,有没有办法自动搜索数据并将数据导入到我要搜索的每个数字的 Excel 工作表中?

是否可以使用 Excel 宏?

4

1 回答 1

1

为此,您可以使用 MSXML 和 MSHTML 库。此代码应该可以帮助您入门。
首先运行这个 sub 来添加两个引用(你只需要运行一次):

Sub addReferences()
    ActiveWorkbook.VBProject.References.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 4, 0
    ActiveWorkbook.VBProject.References.AddFromGuid "{F5078F18-C551-11D3-89B9-0000F81FE221}", 6, 0
End Sub

然后编辑getCAGEValuessub 以导入您的 CAGE 代码并保存结果数据(以及您想要从页面获取的任何其他数据):

Sub getCAGEValues()
    Dim oHTMLDoc As MSHTML.HTMLDocument
    Dim oSpan As MSHTML.HTMLGenericElement
    Dim CAGECodes() As Variant
    CAGECodes = Array("12345", "12346") 'CAGECodes is an array of your codes'
    For Each CAGECode In CAGECodes
        Set oHTMLDoc = getPage(CAGECode)
        Set oSpan = oHTMLDoc.getElementById("ctl00_cphMainPageBody_lblCompNameData") 'The id for the company name'
        MsgBox oSpan.innerText 'Save the value however you want to.'
    Next
End Sub

Function getPage(CAGECode As Variant) As MSHTML.HTMLDocument
    Dim oHttpRequest As MSXML2.XMLHTTP60
    Set oHttpRequest = New MSXML2.XMLHTTP60
    With oHttpRequest
        .Open "GET", "http://www.logisticsinformationservice.dla.mil/BINCS/details.aspx?CAGE=" & CAGECode, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
    End With
    Dim oHTMLDoc As MSHTML.HTMLDocument
    Set oHTMLDoc = New MSHTML.HTMLDocument
    oHTMLDoc.body.innerHTML = oHttpRequest.responseText
    Set getPage = oHTMLDoc
End Function
于 2012-11-13T18:55:43.017 回答