1

我写了下面的代码:

require 'nokogiri'
require 'pp'

html = <<-END
<html>

    <head>

    <title> A Dirge </title>

    <link rel     = "schema.DC"
          href    = "http://purl.org/DC/elements/1.0/">

    <meta name    = "DC.Title"
          content = "A Dirge">

    <meta name    = "DC.Creator"
          content = "Shelley, Percy Bysshe">

    <meta name    = "DC.Type"
          content = "poem">

    <meta name    = "DC.Date"
          content = "1820">

    <meta name    = "DC.Format"
          content = "text/html">

    <meta name    = "DC.Language"
          content = "en">

    </head>

    <body><pre>

            Rough wind, that moanest loud
              Grief too sad for song;
            Wild wind, when sullen cloud
              Knells all the night long;
            Sad storm, whose tears are vain,
            Bare woods, whose branches strain,
            Deep caves and dreary main, -
              Wail, for the world's wrong!

    </pre></body>

    </html>
 END

doc = Nokogiri::HTML::DocumentFragment.parse(html)
pp doc 
doc.children.each do |ch|
    p ch.text if ch.text?
end

但它输出:

"\n\n    \n\n    "
"\n\n    "

现在我的问题是为什么里面的线<pre>..<\pre>没有打印出来?

谁能帮我解决这个问题?

4

1 回答 1

1

doc.children.each块输出比我多一点:

"\n\n \n\n "
“\n\n”
“\n\n”
“\n\n”
“\n\n”
“\n\n”
“\n\n”
“\n\n”
"\n\n \n\n "
“\n\n\n”

这是正确的输出;这些是 . 的直接子节点的文本节点<html>

我不确定你想要哪条“线”而你没有看到。例如,如果您想要 的内容<pre>,您可以这样做

doc.xpath("pre").text

为拿到它,为实现它。如果那不能为您解答问题,则必须澄清您的问题。

于 2013-04-14T18:47:35.003 回答