5

我将 html 存储在我的 c# .net 2.0 代码中的字符串变量中。下面是一个例子:

<div class="track">
    <img alt="" src="http://hits.guardian.co.uk/b/ss/guardiangu-feeds/1/H.20.3/30561?ns=guardian&pageName=Hundreds+feared+dead+in+Haiti+quake%3AArticle%3A1336252&ch=World+news&c3=GU.co.uk&c4=Haiti+%28News%29%2CDominican+Republic+%28News%29%2CCuba+%28News%29%2CBahamas+%28News%29%2CNatural+disasters+and+extreme+weather+%28News%29%2CEnvironment%2CWorld+news&c6=Rory+Carroll%2CHaroon+Siddique&c7=10-Jan-13&c8=1336252&c9=Article&c10=News&c11=World+news&c13=&c25=&c30=content&h2=GU%2FWorld+news%2FHaiti" width="1" height="1" />
</div>
<p class="standfirst">
    • Tens of thousands lose homes in 7.0 magnitude quake<br />
    • UN headquarters, schools and hospitals collapse
</p>
<p>
    René Préval, the president of Haiti, has described the devastation after last night's earthquake as "unimaginable" as governments and aid agencies around the world rushed into action.
</p>
<p>
    Préval described how he had been forced to step over dead bodies and heard the cries of those trapped under the rubble of the national parliament. "Parliament has collapsed. The tax office has collapsed. Schools have collapsed. Hospitals have collapsed," <a href="http://www.miamiherald.com/582/story/1422279.html" title="he told the Miami Herald">he told the Miami Herald</a>. "There are a lot of schools that have a lot of dead people in them." Préval said he thought thousands of people had died in the quake.
</p>

我只想将前两段输出为原始的子字符串。

有人可以帮忙吗?

4

4 回答 4

4

看看Html Agility Pack

它公开了一个非常强大的用于解析 HTML 的 API,可用于提取您想要的数据。

于 2010-01-13T18:16:37.177 回答
4

我最后使用了这个功能......

  private string GetFirstParagraph(string htmltext)
        {
            Match m = Regex.Match(htmltext, @"<p>\s*(.+?)\s*</p>");
            if (m.Success)
            {
                return m.Groups[1].Value;
            }
            else
            {
                return htmltext;
            }
        }
于 2010-01-14T11:04:28.660 回答
0

你在使用 JavaScript 吗?您可以在 p 标签上使用 explode 来获取数组的一个片段中的 div + 第一个 para,以及各自元素中的每个 p 标签。

于 2010-01-13T17:44:05.417 回答
-1

您可以编写一些方法将 HTML 加载到 webbrowser 变量中,然后使用 DOM 遍历节点并提取您想要的任何自定义逻辑。看看这个教程

以下是如何在后面的代码中创建网络浏览器的片段,而不是教程如何告诉您如何做到这一点:

using System.Windows.Forms;

WebBrowser _Browser = null;
string _Source = "Your HTML goes here";

_Browser = new WebBrowser();
_Browser.Navigate("about:Blank");
_Browser.Document.OpenNew(true);
_Browser.Document.Write(_Source);
于 2010-01-13T17:58:30.643 回答