ruby-on-rails - Nokogiri 因为反斜杠而忽略了第一个属性之后的所有内容？

Question

为什么 Nokogiri 因为反斜杠而忽略第一个属性之后的所有内容？

我不太确定它为什么这样做：

[12] pry(Template)> b
=> "<td style=\\\"color:#fff; padding:3px; font-size:11px; text-align:center;\\\">Home Improvement Agreement: Electrical Services & Standby Generators</td>"
[13] pry(Template)> Nokogiri::HTML.parse(b).to_html
=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><td style='\\\"color:#fff;' padding:3px font-size:11px text-align:center>Home Improvement Agreement: Electrical Services &amp; Standby Generators</td></body></html>\n"

注意它是如何产生糟糕的 HTML 的，就像<td>元素中 color 属性之后的所有内容一样。它关闭了属性，并将其余变量分配为name我猜的 HTML 标记。

我很好奇是否有人知道为什么 Nokogiri 会这样做，以及我能做些什么来规避它？

score 2 · Accepted Answer

你要求它解析这个：

<td style=\"color:#fff; ...\">

这是无效的。这是有效的：

<td style="color:#fff; ...">

score 0 · Accepted Answer

尝试：

'<td style="color:#fff; padding:3px; font-size:11px; text-align:center;">Home Improvement Agreement: Electrical Services & Standby Generators</td>'

score 0 · Accepted Answer

Nokogiri 可以轻松判断解析 HTML 或 XML 文档是否存在问题：

require 'nokogiri'

html = '<td style=\"color:#fff; padding:3px; font-size:11px; text-align:center;\">Home Improvement Agreement: Electrical Services & Standby Generators</td>'
doc = Nokogiri::HTML.parse(html)
doc.errors
=> [#<Nokogiri::XML::SyntaxError: error parsing attribute name>, #<Nokogiri::XML::SyntaxError: error parsing attribute name>, #<Nokogiri::XML::SyntaxError: error parsing attribute name>, #<Nokogiri::XML::SyntaxError: htmlParseEntityRef: no name>]

ruby-on-rails - Nokogiri 因为反斜杠而忽略了第一个属性之后的所有内容？

3 回答 3

Related

Reference