1

我从电视列表页面中获取了以下 HTML 代码:

<div class="channel_row">
   <span class="channel">                            
    <div class="logo"><img src ="/images/channel_logos/WGNAMER.png" /></div>                            
    <p><strong>2</strong><br />
    WGNAMER
    </p>
   </span>                          

   <span class="time" style="width:0.0px;padding:0;height:42px;">
     <div style="margin:10px">
       <a class="thickbox" style="" href="/tv/info/?program_id=49909&height=260&width=612" title="WGN News at Nine">WGN News at Nine</a>                                     
       <p class="schedule_flags"><strong class="new_flag">New</strong>, <strong class="cc_flag">CC</strong>, <strong class="stereo_flag">Stereo</strong></p>
     </div>
   </span>                          
   <span class="time" style="width:245.6px;padding:0;height:42px;">
     <div style="margin:10px">
       <a class="thickbox" style="" href="/tv/info/?program_id=49910&height=260&width=612" title="America&#39;s Funniest Home Videos">America&#39;s Funniest Home Videos</a>                                     
       <p class="schedule_flags"><strong class="cc_flag">CC</strong>, <strong class="stereo_flag">Stereo</strong></p>
     </div>
   </span>                          
</div>

它只是一遍又一遍地循环使用channel_row ......

现在我已经在HtmlAgilityPack的帮助下设置了一些 VB 代码,希望有一种快速简便的方法来遍历所有这些类并获取徽标图像、电视频道、电台名称、更多节目描述和节目标题的 HREF

所以在上面的例子中,解析看起来像:

/images/channel_logos/WGNAMER.png
2
WGNAMER
/tv/info/?program_id=49909&height=260&width=612
WGN News at Nine

/tv/info/?program_id=49910&height=260&width=612
America&#39;s Funniest Home Videos

我的VB代码是:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim htmlString As String = "<div class=""channel_row"">" & _
                   "<span class=""channel"">" & _
                           "<div class=""logo""><img src =""/images/channel_logos/WELF.png"" /></div>" & _
                       "<p><strong>13</strong><br />" & _
  "WELF" & _
                       "</p>" & _
                   "</span>" & _
                       "<span class=""time"" style=""width:245.6px;padding:0;height:42px;"">" & _
                           "<div style=""margin:10px"">" & _
                               "<a class=""thickbox"" style="""" href=""/tv/info/?program_id=35424&height=260&width=612"" title=""Praise the Lord"">Praise the Lord</a>" & _
                               "<p class=""schedule_flags""><strong class=""cc_flag"">CC</strong></p>" & _
                           "</div>" & _
                       "</span>" & _
                       "<span class=""time"" style=""width:122.8px;padding:0;height:42px;"">" & _
                           "<div style=""margin:10px"">" & _
                               "<a class=""thickbox"" style="""" href=""/tv/info/?program_id=35425&height=260&width=612"" title=""ACLJ This Week"">ACLJ This Week</a> " & _
                               "<p class=""schedule_flags""><strong class=""cc_flag"">CC</strong></p>" & _
                           "</div>" & _
                       "</span>" & _
                       "<span class=""time"" style=""width:122.8px;padding:0;height:42px;"">" & _
                           "<div style=""margin:10px"">" & _
                               "<a class=""thickbox"" style="""" href=""/tv/info/?program_id=35426&height=260&width=612"" title=""Full Flame"">Full Flame</a>  " & _
                               "<p class=""schedule_flags""><strong class=""cc_flag"">CC</strong></p>" & _
                           "</div>" & _
                       "</span>" & _
                       "<span class=""time"" style=""width:0.0px;padding:0;height:42px;"">" & _
                           "<div style=""margin:10px"">" & _
                               "<a class=""thickbox"" style="""" href=""/tv/info/?program_id=35427&height=260&width=612"" title=""Secrets: Kim Clement"">Secrets: Kim Clement</a>                                     " & _
                               "<p class=""schedule_flags""></p>" & _
                           "</div>" & _
                       "</span>" & _
               "</div>"


    Dim doc = New HtmlAgilityPack.HtmlDocument()
    Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
    htmlDocument.write(htmlString)
    htmlDocument.close()

    doc.LoadHtml(String.Format(htmlString))
    Dim res = doc.DocumentNode.SelectNodes("//div[@class='channel_row']")

    For Each item In res
        Dim firstDiv = item.SelectSingleNode(".//div[@class='channel']")
        Dim content1 = firstDiv.ChildNodes(0).InnerText.Trim()
        Dim content2 = firstDiv.ChildNodes(1).InnerText.Trim()
        Dim content4 = item.SelectSingleNode(".//div[@class='myclass2']")
    Next
End Sub

目前错误在线Dim content1 = firstDiv.ChildNodes(0).InnerText.Trim()说:

你调用的对象是空的。

任何帮助都会很棒!

更新

使用最新的代码建议:

Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(htmlString)

Dim all = new Dictionary(of String, Object)()
For Each channel In doc.DocumentNode.SelectNodes(".//div[@class='channel_row']") 
    Dim info = new Dictionary(of String, Object)()

    With channel

        info!Logo    = .SelectSingleNode(".//img").Attributes("src").Value
        info!Channel = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(0).InnerText
        info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(2).InnerText

        info!Shows = From tag In .SelectNodes(".//a[@class='thickbox']")
                     Select New With {.Show = tag.Attributes("title").Value, .Link = tag.Attributes("href").Value}

    End With

    all.Add(info!Station, info)
Next 

all.Dump()

有3个错误:

1) 在线选择 New With {.Show = Tag.Attributes("title").Value, .Link = Tag.Attributes("href").Value}

错误是:“选择案例”必须以匹配的“结束选择”结尾。

2) 在线all.Add(info!Station, info)

错误是:语句和标签在“选择案例”和第一个“案例”之间无效。

3) 上线all.Dump()

错误是:“转储”不是“System.Collections.Generic.Dictionary(Of String, Object)”的成员。

4

1 回答 1

1

我不是 HtmlAgilityPack 专家,但如何:

Dim htmlString As String = "<div class=""channel_row"">" &  _ ...

Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(htmlString)

Dim all = new Dictionary(of String, Object)()
For Each channel In doc.DocumentNode.SelectNodes(".//div[@class='channel_row']") 
    Dim info = new Dictionary(of String, Object)()

    With channel

        info!Logo    = .SelectSingleNode(".//img").Attributes("src").Value
        info!Channel = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(0).InnerText
        info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(2).InnerText

        info!Shows = From tag In .SelectNodes(".//a[@class='thickbox']")
                     Select New With {.Show = tag.Attributes("title").Value, .Link = tag.Attributes("href").Value}

    End With

    all.Add(info!Station, info)
Next 

all.Dump()

在此处输入图像描述

于 2012-10-25T09:20:07.603 回答