我正在尝试使用 Hpricot 和 Ruby 解析 HTML 文件,但在提取未包含在<p></p>
.
require 'hpricot'
text = <<SOME_TEXT
<a href="http://www.somelink.com/foo/bar.html">Testing:</a><br />
line 1<br />
line 2<br />
line 3<br />
line 4<br />
line 5<br />
<b>Here's some more text</b>
SOME_TEXT
parsed = Hpricot(text)
parsed = parsed.search('//a[@href="http://www.somelink.com/foo/bar.html"]').first.following_siblings
puts parsed
我希望结果是
<br />
line 1<br />
line 2<br />
line 3<br />
line 4<br />
line 5<br />
<b>Here's some more text</b>
但我越来越
<br />
<br />
<br />
<br />
<br />
<br />
<b>Here's some more text</b>
如何使 Hpricot 返回第 1 行、第 2 行等?