1

我不是开发人员,对 XML 的了解非常有限,但过去 3-4 天我在网上研究时学到的东西。所以提前为这个问题的基本水平道歉。我正在尝试结束这一一次性任务。

我有一些 VBA Excel 知识,目前我正在尝试使用 VBA 从 SEC 备案网站上的给定公司页面中提取 SIC 代码属性。例如,这是沃尔玛的网站

http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000104169&owner=exclude&count=40&hidefilings=0

在顶部的蓝色条中,您可以看到“SIC:5331”它是 5331,我正在尝试返回 VBA 变量,以便填充电子表格。当我在 IE 中右键单击并陈词滥调查看源时,相关的页面部分在 XML 中读取为:

<div id="contentDiv">
  <!-- START FILER DIV -->
  <div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;">
    <div class="mailer">Mailing Address
      <span class="mailerAddress">702 SOUTHWEST 8TH STREET</span>
      <span class="mailerAddress"> BENTONVILLE AR 72716         </span>
    </div>
    <div class="mailer">Business Address
      <span class="mailerAddress">702 SOUTHWEST 8TH ST</span>
      <span class="mailerAddress">BENTONVILLE AR 72716         </span>
      <span class="mailerAddress">5012734000</span>
    </div>
    <div class="companyInfo">
      <span class="companyName">WAL MART STORES INC <acronym title="Central Index Key">CIK</acronym>#: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;CIK=0000104169&amp;owner=exclude&amp;count=40">0000104169 (see all company filings)</a></span>
      <p class="identInfo"><acronym title="Standard Industrial Code">SIC</acronym>: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;SIC=5331&amp;owner=exclude&amp;count=40">5331</a> - RETAIL-VARIETY STORES<br />State location: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;State=AR&amp;owner=exclude&amp;count=40">AR</a> | State of Inc.: <strong>DE</strong> | Fiscal Year End: 0131<br />(Assistant Director Office: 2)<br />Get <a href="/cgi-bin/own-disp?action=getissuer&amp;CIK=0000104169"><b>insider transactions</b></a> for this <b> issuer</b>.
        <br />Get <a href="/cgi-bin/own-disp?action=getowner&amp;CIK=0000104169"><b>insider transactions</b></a> for this <b>reporting owner</b>.
      </p>
    </div>
  </div>
</div>

在尝试了解如何使用 VBA 提取 SIC 时,我在您的网站上发现了以下帖子:

使用 VBA 将 xml 属性值查询并解析为 XLS

我试图通过复制/粘贴到 Excel 模块中来应用 barrowc 的答案,并插入沃尔玛文件的路径但是当我逐步完成时,我得到了 Debug.Print "*****" 但我没有得到任何东西。文本。

Sub test4()
    Dim d As MSXML2.DOMDocument60
    Dim i As IXMLDOMNodeList
    Dim n As IXMLDOMNode

    Set d = New MSXML2.DOMDocument60
    d.async = False
    d.Load ("http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=886475&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany")

    Debug.Print "*****"
    Set i = d.SelectNodes("//div[@id='contentDiv']")
    For Each n In i
        Debug.Print n.Text
    Next n
    Debug.Print "*****"

    Set d = Nothing
End Sub

我在 中尝试了各种字符串d.SelectNodes(),但我对这个主题的了解还不够,无法理解我哪里出错了。因此,对我的语法的评论或指向资源的指针都会非常有帮助。

4

2 回答 2

1

如果您只对 SIC 感兴趣,则不值得花时间尝试解析整个 DOM 结构。相反,识别一组独特的字符,搜索它,然后从那里提取 SIC。

下面的函数就是这样做的。您只需将页面的完整 HTML 源代码传递给它,它将返回 SIC:

Function ExtractSIC(SourceHtml As String) As String
    Const PrefixChars As String = "&amp;SIC="
    Const SuffixChars As String = "&"
    Dim StartPos As Long, EndPos As Long
    StartPos = InStr(SourceHtml, PrefixChars)
    If StartPos = 0 Then Exit Function

    StartPos = StartPos + Len(PrefixChars)
    EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
    ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function
于 2013-05-08T15:26:17.257 回答
0

再次感谢 mwolfe。我已经在下面发布了我的代码,但是您提供的代码更加优雅。我知道 SIC 只有 4 位数字,所以我很懒,在代码中做了一个假设,这可能会在未来引发错误。您可以在注释掉的部分看到我是如何做到的。

Sub GetSICs()
    Application.ScreenUpdating = False

    Dim AWBN As String
    Dim ASN As String
    Dim CIK As String
    Dim NUM_FILES_TO_GET As Long
    Dim COUNTER As Long
    Dim SICTagPos As Integer
    Dim SIC As String

    Set IEbrowser = CreateObject("InternetExplorer.application")
    IEbrowser.Visible = False
    AWBN = ActiveWorkbook.Name
    ASN = ActiveSheet.Name
    Workbooks(AWBN).Sheets(ASN).Range("A1").Select
    ActiveCell.Offset(0, 11) = "SIC"
    NUM_FILES_TO_GET = Application.WorksheetFunction.CountA(Range("A:A"))
    For COUNTER = 1 To 3 'NUM_FILES_TO_GET
        Application.StatusBar = "Counter = " & COUNTER
        'SICTagPos = 0
        CIK = ActiveCell.Offset(COUNTER, 2)
        IEbrowser.Navigate URL:="http://www.sec.gov/edgar/searchedgar/companysearch.html"
        Do
            DoEvents
        Loop Until IEbrowser.readyState = 4
        Set frm = IEbrowser.Document.forms(0)
        frm("CIK").Value = CIK
        frm.submit
        While IEbrowser.Busy Or IEbrowser.readyState <> 4: DoEvents: Wend
        SIC = ExtractSIC(IEbrowser.Document.body.innerhtml)
        'SICTagPos = InStr(1, IEbrowser.Document.body.innerhtml, "SIC=")
        'SIC = Right(Left(IEbrowser.Document.body.innerhtml, SICTagPos + 7), 4)
        ActiveCell.Offset(COUNTER, 11).NumberFormat = "@"
        ActiveCell.Offset(COUNTER, 11) = SIC

    Next

    Application.StatusBar = False
    Application.ScreenUpdating = True

End Sub


Function ExtractSIC(SourceHtml As String) As String
    Const PrefixChars As String = "&amp;SIC="
    Const SuffixChars As String = "&"
    Dim StartPos As Long, EndPos As Long
    StartPos = InStr(SourceHtml, PrefixChars)
    If StartPos = 0 Then Exit Function

    StartPos = StartPos + Len(PrefixChars)
    EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
    ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function
于 2013-05-09T13:19:39.113 回答