1

我是 Excel VBA/宏的新手

我需要抓取页面的特定部分,而不是整页。波纹管代码在完整页面中工作,但不需要页面的所有部分。

Sub GrabOutStandingTable()

With ActiveSheet.QueryTables.Add(Connection:= _
    "URL;http://dsebd.org/displayCompany.php?name=ABBANK", Destination:=Range( _
    "$A$1"))
    .CommandType = 0
    .Name = "displayCompany.php?name=ABBANK"
    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .BackgroundQuery = True
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .WebSelectionType = xlSpecifiedTables
    .WebFormatting = xlWebFormattingNone
    .WebTables = """company"""
    .WebPreFormattedTextToColumns = True
    .WebConsecutiveDelimitersAsOne = True
    .WebSingleBlockTextImport = False
    .WebDisableDateRecognition = False
    .WebDisableRedirections = False
    .Refresh BackgroundQuery:=False
End With
Sheets.Add After:=ActiveSheet
End Sub

标题为“公司其他信息”的表格部分在页面的下部,这就是我所说的。宏应该提取这部分。

4

2 回答 2

1

XHR 请求:

company如果您通过元素的(非唯一) ID收集元素,您可以执行更快的无浏览器 XHR 请求并简单地定位感兴趣的表,该表位于位置 23 。

我使用querySelectorAll方法来获取匹配的节点,然后在索引 23 处提取表。

请注意代码输出中显示的其他赞助商信息。


网页视图:

页


示例代码输出:

工作表视图


代码:

Option Explicit
Public Sub GetTable()
    Dim sResponse As String, hTable As Object, HTML As New HTMLDocument
    Application.ScreenUpdating = False
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://dsebd.org/displayCompany.php?name=ABBANK", False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With

    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
        With HTML
            .body.innerHTML = sResponse
            Set hTable = .querySelectorAll("#company")(23)
        End With
       WriteTable hTable
       Application.ScreenUpdating = True
End Sub

Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)

    If ws Is Nothing Then Set ws = ActiveSheet

    Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, R As Long, C As Long, tBody As Object
    R = startRow
    With ws
        Dim headers As Object, header As Object, columnCounter As Long
        Set headers = hTable.getElementsByTagName("th")
        For Each header In headers
            columnCounter = columnCounter + 1
            .Cells(startRow, columnCounter) = header.innerText
        Next header
        startRow = startRow + 1
        Set tBody = hTable.getElementsByTagName("tbody")
        For Each tSection In tBody               'HTMLTableSection
            Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
            For Each tr In tRow
                R = R + 1
                Set tCell = tr.getElementsByTagName("td")
                C = 1
                For Each td In tCell             'DispHTMLElementCollection
                    .Cells(R, C).Value = td.innerText 'HTMLTableCell
                    C = C + 1
                Next td
            Next tr
        Next tSection
    End With
End Sub

参考:

VBE > 工具 > 参考 > HTML 对象库

于 2018-07-05T07:10:17.913 回答
0

由于网站的结构方式,旧的 Data > From Web 无法处理此问题。您需要的数据深深嵌套在其他表中,由多个表组成。

建议改用 Power Query(不需要 VBA)。这是如何在 XL2013 中使用 Power Query 的方法。使用 Excel 的功能区并找到选项卡 POWER QUERY。

  1. 使用菜单选项:POWER QUERY > From Web
  2. 从 Web 对话框显示。输入您的网址。
  3. 点击确定
  4. 您需要的数据在表 30 中。找到并单击它,然后单击加载。

如果我们有 XL2016 (office 365),我们已经有 Power Query。如果我们有 XL2010 或 XL2013,我们可以从以下网址下载:https ://www.microsoft.com/en-us/download/details.aspx?id=39379&CorrelationId=1441491e-917e-43de-8d6a-21f98287c3c2

于 2016-10-27T18:27:04.780 回答