4

我可以通过 note.Remove() 轻松删除元素,如下所示:

HtmlDocument html = new HtmlDocument();

html.Load(Server.MapPath(@"~\Site\themes\default\index.cshtml"));

foreach (var item in html.DocumentNode.SelectNodes("//removeMe"))
{
    item.Remove();
}

但这也删除了 innerHtml 。如果我只想删除标签并保留 innerHtml 怎么办?

例子:

<ul>
    <removeMe>
        <li>
            <a href="#">Keep me</a>
        </li>
    </removeMe>
</ul>

任何帮助,将不胜感激 :)

4

10 回答 10

22
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var node = doc.DocumentNode.SelectSingleNode("//removeme");
node.ParentNode.RemoveChild(node, true);
于 2012-08-23T13:58:54.297 回答
3

这应该有效:

foreach (var item in doc.DocumentNode.SelectNodes("//removeMe"))
{
    if (item.PreviousSibling == null)
    {
        //First element -> so add it at beginning of the parent's innerhtml
        item.ParentNode.InnerHtml = item.InnerHtml + item.ParentNode.InnerHtml;
    }
    else
    {
        //There is an element before itemToRemove -> add the innerhtml after the previous item
        foreach(HtmlNode node in item.ChildNodes){
            item.PreviousSibling.ParentNode.InsertAfter(node, item.PreviousSibling);
        }
    }
    item.Remove();
}
于 2012-08-23T13:55:01.787 回答
3

bool KeepGrandChildren 实现存在问题,对于那些可能包含他们试图删除的元素的文本的人。如果 removeme 标记中包含文本,则该文本也将被删除。例如<removeme>text<p>more text</p></removeme>会变成<p>more text</p>

试试这个:

private static void RemoveElementKeepText(HtmlNode node)
    {
        //node.ParentNode.RemoveChild(node, true);
        HtmlNode parent = node.ParentNode;
        HtmlNode prev = node.PreviousSibling;
        HtmlNode next = node.NextSibling;

        foreach (HtmlNode child in node.ChildNodes)
        {
            if (prev != null)
                parent.InsertAfter(child, prev);
            else if (next != null)
                parent.InsertBefore(child, next);
            else
                parent.AppendChild(child);

        }
        node.Remove();
    }
于 2012-09-18T16:53:57.160 回答
1

有一个简单的方法:

 element.InnerHtml = element.InnerHtml.Replace("<br>", "{1}"); 
 var innerTextWithBR = element.InnerText.Replace("{1}", "<br>");
于 2013-03-07T19:42:08.793 回答
1

添加我的两分钱,因为这些方法都没有处理我想要的(删除一组给定的标签,如panddiv并在保留内部标签的同时正确处理嵌套)。

以下是我想出的并通过了我所有的单元测试以及我认为我需要处理的大多数情况:

var htmlDoc = new HtmlDocument();

// load html
htmlDoc.LoadHtml(html);

var tags = (from tag in htmlDoc.DocumentNode.Descendants()
           where tagNames.Contains(tag.Name)
           select tag).Reverse();

// find formatting tags
foreach (var item in tags)
{
    if (item.PreviousSibling == null)
    {
        // Prepend children to parent node in reverse order
        foreach (HtmlNode node in item.ChildNodes.Reverse())
        {
            item.ParentNode.PrependChild(node);
        }                        
    }
    else
    {
        // Insert children after previous sibling
        foreach (HtmlNode node in item.ChildNodes)
        {
            item.ParentNode.InsertAfter(node, item.PreviousSibling);
        }
    }

    // remove from tree
    item.Remove();
}

// return transformed doc
html = htmlDoc.DocumentNode.WriteContentTo().Trim();

以下是我用来测试的案例:

[TestMethod]
public void StripTags_CanStripSingleTag()
{
    var input = "<p>tag</p>";
    var expected = "tag";
    var actual = HtmlUtilities.StripTags(input, "p");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripNestedTag()
{
    var input = "<p>tag <p>inner</p></p>";
    var expected = "tag inner";
    var actual = HtmlUtilities.StripTags(input, "p");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripTwoTopLevelTags()
{
    var input = "<p>tag</p> <div>block</div>";
    var expected = "tag block";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripMultipleNestedTags_2LevelsDeep()
{
    var input = "<p>tag <div>inner</div></p>";
    var expected = "tag inner";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripMultipleNestedTags_3LevelsDeep()
{
    var input = "<p>tag <div>inner <p>superinner</p></div></p>";
    var expected = "tag inner superinner";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripTwoTopLevelMultipleNestedTags_3LevelsDeep()
{
    var input = "<p>tag <div>inner <p>superinner</p></div></p> <div><p>inner</p> toplevel</div>";
    var expected = "tag inner superinner inner toplevel";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_IgnoresTagsThatArentSpecified()
{
    var input = "<p>tag <div>inner <a>superinner</a></div></p>";
    var expected = "tag inner <a>superinner</a>";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);

    input = "<wrapper><p>tag <div>inner</div></p></wrapper>";
    expected = "<wrapper>tag inner</wrapper>";
    actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripSelfClosingAndUnclosedTagsLikeBr()
{
    var input = "<p>tag</p><br><br/>";
    var expected = "tag";
    var actual = HtmlUtilities.StripTags(input, "p", "br");

    Assert.AreEqual(expected, actual);
}

它可能无法处理所有事情,但它可以满足我的需要。

于 2014-03-04T20:00:05.060 回答
0

也许这可能是您正在寻找的?

foreach (HtmlNode node in html.DocumentNode.SelectNodes("//removeme"))
{
    HtmlNodeCollection children = node.ChildNodes; //get <removeme>'s children
    HtmlNode parent = node.ParentNode; //get <removeme>'s parent
    node.Remove(); //remove <removeme>
    parent.AppendChildren(children); //append the children to the parent
}

编辑:LB的答案更清晰。和他一起去!

于 2012-08-23T13:59:55.973 回答
0

这个怎么样?

var removedNodes = document.SelectNodes("//removeme");
if(removedNodes != null)
    foreach(var rn in removedNodes){
        HtmlTextNode innernodes =document.CreateTextNode(rn.InnerHtml);
        rn.ParnetNode.ReplaceChild(innernodes, rn);
    }
于 2013-04-13T10:43:41.667 回答
0

通常正确的表达式是node.ParentNode.RemoveChildren(node, true).

由于HtmlNode.RemoveChildren()http://htmlagilitypack.codeplex.com/discussions/79587)中的一个排序错误,我创建了一个类似的方法。对不起,它在VB中。如果有人要翻译,我会写一个。

'The HTML Agility Pack (1.4.9) includes the HtmlNode.RemoveChild() method but it has an ordering bug with preserving child nodes.  
'The below implementation orders children correctly.
Private Shared Sub RemoveNode(node As HtmlAgilityPack.HtmlNode, keepChildren As Boolean)
    Dim parent = node.ParentNode
    If keepChildren Then
        For i = node.ChildNodes.Count - 1 To 0 Step -1
            parent.InsertAfter(node.ChildNodes(i), node)
        Next
    End If
    node.Remove()
End Sub

我已经使用以下测试标记测试了此代码:

<removeme>
    outertextbegin
    <p>innertext1</p>
    <p>innertext2</p>
    outertextend
</removeme>

输出是:

outertextbegin
<p>innertext1</p>
<p>innertext2</p>
outertextend
于 2014-12-03T17:57:05.693 回答
0

这是 C# 中的版本 - 2014 年 12 月 3 日 17:57 的帖子的答案 - 伪编码器

该网站不允许我评论和添加到原始帖子。也许它会帮助某人。

private void removeNode(HtmlAgilityPack.HtmlNode node, bool keepChildren)
{
    var parent = node.ParentNode;
    if (keepChildren)
    {
        for ( int i = node.ChildNodes.Count - 1; i >= 0; i--)
        {
            parent.InsertAfter(node.ChildNodes[i], node);
        }            
    }
    node.Remove(); 
}
于 2020-09-10T08:29:07.237 回答
-3

您可以使用正则表达式还是需要使用 htmlagilitypack?

string html = "<ul><removeMe><li><a href="#">Keep me</a></li></removeMe></ul>";

html = Regex.Replace(html, "<removeMe.*?>", "", RegexOptions.Compiled);
html = Regex.Replace(html, "</removeMe>", "", RegexOptions.Compiled);
于 2012-08-23T13:32:38.180 回答