0

我正在使用流阅读器来获取某些页面的 HTML,但有些行我想忽略,例如一行以<span> 

有什么建议吗?这是我的功能

Public Function GetPageHTMLReaderNoPrx(ByVal address As Uri) As StreamReader
  Dim request As HttpWebRequest
  Dim response As HttpWebResponse = Nothing
  Dim reader As StreamReader

  Try
    request = DirectCast(WebRequest.Create(address), HttpWebRequest)
    response = DirectCast(request.GetResponse(), HttpWebResponse)

    Select Case CType(response, Net.HttpWebResponse).StatusCode
      Case 200
        reader = New StreamReader(response.GetResponseStream(), Encoding.Default)

      Case Else
        MsgBox(CType(response, Net.HttpWebResponse).StatusCode)
    End Select
  Catch
    If Not response Is Nothing Then response.Close()
  End Try
  Return reader
End Function

这就是 HTML 的样子

<tr>Text
<span>show all</span>
</tr>
4

1 回答 1

1

如果你坚持使用字符串,你可以这样做:

Do
  Dim line As String = reader.ReadLine()
  If line Is Nothing Then Exit Do 'end of stream
  If line.StarsWith("<span>") Then Exit Do 'ignore this line
  'otherwise do some processing here
  '...
Loop

但这种方法并不稳定 - 输入 HTML 中的任何微小变化都会破坏您的流程。

更优雅的解决方案是使用XElement

Dim xml = <tr>Text
            <span>show all</span>
          </tr>
xml.<span>.Remove()
MsgBox(xml.Value.Trim)
于 2012-11-07T01:39:13.617 回答