-1

我是 VB 2008.net 的新手,我想做的是从下面的 html 中提取一些元素

<TABLE>
  <TR>
    <TD></TD>
    <TH scope="col">PAT. NO.</TH><TD></TD><TH scope="col">Title</TH>
  </TR>
  <TR>
    <TD valign=top>
      10
    </TD>
    <TD valign=top>
      <A  HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a>8,519,110</A>
    </TD>
    <TD valign=baseline>
      <IMG border=0 src="/netaicon/PTO/ftext.gif" alt="Full-Text">
    </TD>
    <TD valign=top>
      <A  HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a>mRNA cap analogs</A>
    </TD>

所以我希望我的文本框显示下面

/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a

8,519,110

mRNA cap analogs

重复上面的 html 标记以获得更多的表行,并且想要获取所有这些行,我读过我们可以使用“GetAttribute”来获取 html 元素,但我想提取上面提到的特定部分。

4

2 回答 2

1

如果不了解您为什么要这样做,就很难为您提供一个好的解决方案。

我将提供两种选择:

1) VB.NET - 不清楚您如何在 HTML 中设置属性。你应该能够做类似的事情(注意:这是我对 VB.net 的记忆,在这里手工编码,而不是使用 VS.net):

HTML 视图:

<asp:HyperLink id="FirstLink" runat="server" />
...

代码隐藏

FirstLink.NavigateUrl = yourUrlVariableHere
...
YourInputBox.Text = String.Concat(yourUrlVariableHere, yourOtherVariablesHere ...)

2) jQuery -

本质上,您想获取属性然后显示它们:

$(function(){
    var anchor1 = $("#firstAnchor").attr("href");
    var imageSrc = $("#my-image").attr("src");

    $("#my-display").html(anchor1+ "<br/>" + imageSrc );
});

完整样本在这里

于 2013-09-16T17:22:52.047 回答
1

我有一个用来从 HTML 表中提取数据的例程(对不起,我不相信原作者,我找到了这段代码,但不知道它来自哪里)。它解析表格字符串中的 HTML,并将单元格加载到数据集中。

    Public Shared Function ConvertHtmlTablesToDataSet(html As String) As DataSet
    Dim dt As DataTable
    Dim ds As New DataSet()
    dt = New DataTable()
    Dim tableExpression As String = "<table[^>]*>(.*?)</table>"
    Dim headerExpression As String = "<th[^>]*>(.*?)</th>"
    Dim rowExpression As String = "<tr[^>]*>(.*?)</tr>"
    Dim columnExpression As String = "<td[^>]*>(.*?)</td>"
    Dim headersExist As Boolean = False
    Dim iCurrentColumn As Integer = 0
    Dim iCurrentRow As Integer = 0

    Dim tables As MatchCollection = Regex.Matches(html, tableExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)


    For Each table As Match In tables
        iCurrentRow = 0
        headersExist = False
        dt = New DataTable()

        If table.Value.Contains("<th") Then
            headersExist = True

            Dim headers As MatchCollection = Regex.Matches(table.Value, headerExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)

            For Each header As Match In headers
                dt.Columns.Add(header.Groups(1).ToString())
            Next
        Else

            Dim myvar2222 As Integer = Regex.Matches(Regex.Matches(Regex.Matches(table.Value, tableExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)(0).ToString(), rowExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)(0).ToString(), columnExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase).Count

            For iColumns As Integer = 1 To myvar2222
                dt.Columns.Add("Column " + System.Convert.ToString(iColumns))

            Next
        End If

        Dim rows As MatchCollection = Regex.Matches(table.Value, rowExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)
        Try

            For Each row As Match In rows
                If Not ((iCurrentRow = 0) And headersExist) Then
                    Dim dr As DataRow = dt.NewRow()
                    iCurrentColumn = 0

                    Dim columns As MatchCollection = Regex.Matches(row.Value, columnExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)

                    For Each column As Match In columns
                        dr(iCurrentColumn) = column.Groups(1).ToString()
                        iCurrentColumn += 1
                        If iCurrentColumn = dt.Columns.Count Then Exit For
                    Next

                    dt.Rows.Add(dr)
                End If
                iCurrentRow += 1
            Next

            ds.Tables.Add(dt)
        Catch ex As Exception
            Stop
        End Try
    Next

    Return ds
End Function
于 2013-09-17T08:01:44.427 回答