我在这个使用 Ruby 的 Sanitize 库创建转换器 lambda 的示例中遇到了一些问题。
我已经完成并整理了一个简单的脚本,该脚本试图清理我的options[:content]
变量中的任何内容,但是尽管遇到了返回包含名为 :node_whitelist 的节点数组的哈希的位,但似乎不知何故我的节点没有进入白名单.
这是我的代码:
#!/usr/bin/ruby
require 'rubygems'
require 'sanitize'
options = { :content => "<p>Here is my content. It has a video: <object width='480' height='390'><param name='movie' value='http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US'></param><param name='allowFullScreen' value='true'></param><param name='allowscriptaccess' value='always'></param><embed src='http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US' type='application/x-shockwave-flash' allowscriptaccess='always' allowfullscreen='true' width='480' height='390'></embed></object></p>" }
# adapted from example at https://github.com/rgrove/sanitize/
video_embed_sanitizer = lambda do |env|
node = env[:node]
node_name = env[:node_name]
puts "[video_embed_sanitizer] Starting up"
puts "[video_embed_sanitizer] node is #{node}"
puts "[video_embed_sanitizer] node.name.to_s.downcase is #{node.name.to_s.downcase}"
# Don't continue if this node is already whitelisted or is not an element.
if env[:is_whitelisted] then
puts "[video_embed_sanitizer] Already whitelisted"
end
return nil if env[:is_whitelisted] || !node.element?
parent = node.parent
# Since the transformer receives the deepest nodes first, we look for a
# <param> element or an <embed> element whose parent is an <object>.
return nil unless (node.name.to_s.downcase == 'param' || node.name.to_s.downcase == 'embed') &&
parent.name.to_s.downcase == 'object'
if node.name.to_s.downcase == 'param'
# Quick XPath search to find the <param> node that contains the video URL.
return nil unless movie_node = parent.search('param[@name="movie"]')[0]
url = movie_node['value']
else
# Since this is an <embed>, the video URL is in the "src" attribute. No
# extra work needed.
url = node['src']
end
# Verify that the video URL is actually a valid YouTube video URL.
puts "[video_embed_sanitizer] URL is #{url}"
return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//
# We're now certain that this is a YouTube embed, but we still need to run
# it through a special Sanitize step to ensure that no unwanted elements or
# attributes that don't belong in a YouTube embed can sneak in.
puts "[video_embed_sanitizer] Node before cleaning is #{node}"
Sanitize.clean_node!(parent, {
:elements => %w[embed object param],
:attributes => {
'embed' => %w[allowfullscreen allowscriptaccess height src type width],
'object' => %w[height width],
'param' => %w[name value]
}
})
puts "[video_embed_sanitizer] Node after cleaning is #{node}"
# Now that we're sure that this is a valid YouTube embed and that there are
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
# to whitelist the current node (<param> or <embed>) and its parent
# (<object>).
puts "[video_embed_sanitizer] Marking node as whitelisted and returning"
{:node_whitelist => [node, parent]}
end
options[:content] = Sanitize.clean(options[:content], :elements => ['a', 'b', 'blockquote', 'br', 'em', 'i', 'img', 'li', 'ol', 'p', 'span', 'strong', 'ul'],
:attributes => {'a' => ['href', 'title'], 'span' => ['class', 'style'], 'img' => ['src', 'alt']},
:protocols => {'a' => {'href' => ['http', 'https', :relative]}},
:add_attributes => { 'a' => {'rel' => 'nofollow'}},
:transformers => [video_embed_sanitizer])
puts options[:content]
这是正在生成的输出:
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] URL is http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US
[video_embed_sanitizer] Node before cleaning is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] Node after cleaning is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] Marking node as whitelisted and returning
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="allowFullScreen" value="true">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="allowscriptaccess" value="always">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] node.name.to_s.downcase is embed
[video_embed_sanitizer] URL is http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US
[video_embed_sanitizer] Node before cleaning is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] Node after cleaning is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] Marking node as whitelisted and returning
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <object width="480" height="390"></object>
[video_embed_sanitizer] node.name.to_s.downcase is object
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <p>Here is my content. It has a video: </p>
[video_embed_sanitizer] node.name.to_s.downcase is p
<p>Here is my content. It has a video: </p>
我究竟做错了什么?