事实证明,解决方案很简单,尽管并不完全干净。像这样缓存 Mechanize#get() 的结果是一件简单的事情:
class CachingMechanize < Mechanize
def get(uri, parameters = [], referer = nil, headers = {})
WebCache.with_web_cache(uri.to_s) { super }
end
end
... with_web_cache() 使用 YAML 序列化和缓存 super 返回的对象。
我的问题是,默认情况下,Mechanize#get() 返回一个 Mechanize::Page 对象,其中包含一些 lambda 对象,YAML 无法转储和加载该对象。解决方法是消除那些 lambda,结果证明这很简单。完整代码如下。
class CachingMechanize < Mechanize
def initialize(*args)
super
sanitize_scheme_handlers
end
def get(uri, parameters = [], referer = nil, headers = {})
WebCache.with_web_cache(uri.to_s) { super }
end
# private
def sanitize_scheme_handlers
scheme_handlers['http'] = SchemeHandler.new
scheme_handlers['https'] = scheme_handlers['http']
scheme_handlers['relative'] = scheme_handlers['http']
scheme_handlers['file'] = scheme_handlers['http']
end
class SchemeHandler
def call(link, page) ; link ; end
end
end
道德:不要尝试 YAML.dump 和 YAML.load 包含 lambda 或 proc 的对象
这不仅仅是这个例子:如果你看到一个 YAML 错误,内容如下:
TypeError: allocator undefined for Proc
检查您尝试序列化和反序列化的对象中是否有 lambda 或 proc。如果您能够(就像我在这种情况下一样)用对对象的方法调用替换 lambda,那么您应该能够解决这个问题。
希望这对其他人有帮助。
更新
响应@Martin 对 WebCache 定义的请求,这里是:
# Simple model for caching pages fetched from the web. Assumes
# a schema like this:
#
# create_table "web_caches", :force => true do |t|
# t.text "key"
# t.text "value"
# t.datetime "expires_at"
# t.datetime "created_at", :null => false
# t.datetime "updated_at", :null => false
# end
# add_index "web_caches", ["key"], :name => "index_web_caches_on_key", :unique => true
#
class WebCache < ActiveRecord::Base
serialize :value
# WebCache.with_web_cache(key) {
# ...body...
# }
#
# Searches the web_caches table for an entry with a matching key. If
# found, and if the entry has not expired, the value for that entry is
# returned. If not found, or if the entry has expired, yield to the
# body and cache the yielded value before returning it.
#
# Options:
# :expires_at sets the expiration date for this entry upon creation.
# Defaults to one year from now.
# :expired_prior_to overrides the value of 'now' when checking for
# expired entries. Mostly useful for unit testing.
#
def self.with_web_cache(key, opts = {})
serialized_key = YAML.dump(key)
expires_at = opts[:expires_at] || 1.year.from_now
expired_prior_to = opts[:expired_prior_to] || Time.zone.now
if (r = self.where(:key => serialized_key).where("expires_at > ?", expired_prior_to)).exists?
# cache hit
r.first.value
else
# cache miss
yield.tap {|value| self.create!(:key => serialized_key, :value => value, :expires_at => expires_at)}
end
end
# Prune expired entries. Typically called by a cron job.
def self.delete_expired_entries(expired_prior_to = Time.zone.now)
self.where("expires_at < ?", expired_prior_to).destroy_all
end
end