1

我对 nokogiri 有意见。假设我有这个 HTML

<html> 
<p>
        This is just an example, how to remove the next sentence using nokogiri in Ruby.
        Thank you for your help.
        <strong> XXXX </strong>
            <br/> 
            <br />
        I want to remove all the HTML after the strong XXXX
            <br />
            <br />
        <strong> YYY </strong>
</p>

我怎么能得到"This is just an example, how to remove the next sentence using nokogiri ... Thank you for your help."?我不想包含 HTML<strong> XXXX直到它的其余部分。

4

3 回答 3

2

要明确排除,您可能想尝试

doc.search('//p/text()[not(preceding-sibling::strong)]').text

这表示获取所有不在 a 之后的文本节点strong

鉴于您的输入,这将提取以下内容:

        This is just an example, how to remove the next sentence using nokogiri in Ruby.
        Thank you for your help.
于 2013-07-31T02:02:51.250 回答
0

如果您只是想获取文本(我认为您要问的是),那么您可以在 Nokogiri 元素上调用 text 方法。这将返回您“...谢谢您的帮助 XXX 我想删除强 XXXX YYY 之后的所有 HTML”。如果有帮助,这里是 Nokogiri文档的链接- 它讨论了文本方法。或者您是在谈论试图在标签之后不获取任何文本/html?

于 2013-07-30T05:37:15.903 回答
0

希望您正在寻找以下内容:

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-_HTML_
<p>
        This is just an example, how to remove the next sentence using nokogiri in Ruby.
        Thank you for your help.
        <strong> XXXX </strong>
            <br/> 
            <br />
        I want to remove all the HTML after the strong XXXX
            <br />
            <br />
        <strong> YYY </strong>
</p>
_HTML_

puts doc.at('//p/text()[1]').to_s.strip
# >> This is just an example, how to remove the next sentence using nokogiri in Ruby.
# >>         Thank you for your help.

现在,如果您想从源 html 本身中删除不需要的 html 内容,那么您可以尝试以下操作:

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-_HTML_
<p>
        This is just an example, how to remove the next sentence using nokogiri in Ruby.
        Thank you for your help.
        <strong> XXXX </strong>
            <br/> 
            <br />
        I want to remove all the HTML after the strong XXXX
            <br />
            <br />
        <strong> YYY </strong>
</p>
_HTML_


doc.xpath('//p/* | //p/text()').count # => 10
ndst = doc.search('//p/* | //p/text()')[1..-1]
ndst.remove


puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>
# >>         This is just an example, how to remove the next sentence using nokogiri in Ruby.
# >>         Thank you for your help.
# >>         </p></body></html>
于 2013-07-30T06:12:45.767 回答