我一直在尝试编写一个脚本,该脚本将在渲染网站上抓取某个字段/标签。该网站使用我在 Excel 的列列表中的搜索参数呈现。大约 20 个会增长的项目。在研究了如何使用 vbscripts 进行网络抓取之后,我遇到的问题是如何在不中断的情况下执行 20 次。这是我的代码。
Excel 栏
1492565
1528417
1529041
1530688
1492038
1492319
1492972
1508824
1513351
1514724
1514750
1518526
1520627
1520706
1520979
1523367
1523563
脚本:主子(从 excel 输入字段获取用户/通行证,通过特定列上的行设置循环。只吐回一个 msgbox,直到我可以让循环工作。然后我将它输出到另一列
Sub WebScraper()
'itg on mainWS start row 6, column 5
'itg status column column 19
'declare variables
Dim url As String
Dim ITGNUMBER As Long
Dim user As String
Dim pwd As String
'set variables
url = "https://website/itg/web/knta/crt/RequestDetail.jsp?REQUEST_ID="
Set oMainWS = ActiveWorkbook.Worksheets("MainWS")
Set oITGStatusWS = ActiveWorkbook.Worksheets("ITGStatus")
user = ""
pwd = ""
user = oITGStatusWS.ITGusername.Value
pwd = oITGStatusWS.ITGpassword.Value
If user = "" Or pwd = "" Then
MsgBox ("You must enter username/password before continuing")
Exit Sub
End If
'log in
Set objIE = FirstIEConnect(user, pwd)
'start row is 6
RowCounter = 58
ColumnCounter = 5
ITGStatusColumn = 16
Do Until IsEmpty(oMainWS.Cells(RowCounter, 5).Value)
'get ITG number
currentITGNumber = oMainWS.Cells(RowCounter, 5).Value
MsgBox (currentITGNumber)
'get remote status
currentITGStatus = getITGStatusFunction(objIE.Application, Str(currentITGNumber))
MsgBox (currentITGStatus)
'paste into column 19
'oMainWS.Cells(RowCounter, 19).Value = currentITGStatus
'increment counter
RowCounter = RowCounter + 1
currentITGStatus = ""
currentITGNumber = ""
Loop
quitIE (objIE.Application)
End Sub
子退出IE对象清理,有一个javascript函数可以注销用户。
Sub quitIE(obj As Object)
obj.Navigate ("javascript: closeChildWindowsAndLogout();")
obj.Quit
End Sub
从谷歌得到这个子,用来等待 IE 对象准备好。这实际上在循环中失败了很多。On Do While IE.Busy:Loop。就挂了。
Sub Wait(obj As Object)
Do While obj.Busy: Loop
Do While obj.readyState <> 4: Loop
Application.Wait (Now + TimeValue("0:00:01"))
End Sub
该网站需要登录,用户/密码来自第一个子。该子程序创建 IE 对象,导航到登录页面并将用户/密码插入到Document.logon.UserName
和Document.logon.Password
中。最后提交。
Function FirstIEConnect(user As String, pwd As String)
loginURL = "https://website/Logon.jsp"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.Navigate loginURL
Wait (IE.Application)
With IE.Document.logon
.UserName.Value = user
.Password.Value = pwd
.submit
End With
Set FirstIEConnect = IE
End Function
这是实际的网页抓取功能。要求用户从上面登录到 IE 对象。在 url GET 请求中输入 num 以呈现特定页面。最后responseText
根据 ElementID 抓取
Function getITGStatusFunction(obj, num)
On Error Resume Next
'set url and num
url = "https://website/RequestDetail.jsp?REQUEST_ID=" & num
obj.Navigate url
Wait (obj.Application)
responseText = obj.Document.getElementByID("DRIVEN_STATUS_ID").innerHTML
getStatusFunction = responseText
End Function
同样,问题是我在尝试从不同的子程序和函数传递 IE 对象时不断收到对象错误。
期望:我希望脚本循环遍历包含唯一数字的 excel 中的列信息。将这些数字一一获取,并将它们一一附加到搜索 URL。页面加载后,将ElementID(DRIVEN_STATUS_ID)
. 最终获取该值并将其输出到另一列。