2

我目前正在考虑提高Rails.cache.write使用 dalli 将项目写入memcachier云时的性能。

与缓存相关的堆栈当前为:

heroku, memcachier heroku addon, dalli 2.6.4, rails 3.0.19

我正在使用 newrelic 进行性能监控。

我目前正在为给定的登录用户获取“活跃的学生”,由一个BusinessUser实例表示,当它的active_students方法从处理需要“活跃学生”列表的请求的控制器调用时:

class BusinessUser < ActiveRecord::Base
  ...
  def active_students
    Rails.cache.fetch("/studio/#{self.id}/students") do
      customer_users.active_by_name
    end
  end
  ...
end

在查看了 newrelic 之后,我基本上缩小了应用程序在 memcachier 上设置键值的一大性能影响。每次平均需要225ms。此外,设置 memcache 键值看起来会阻塞主线程并最终破坏请求队列。显然这是不可取的,尤其是当缓存策略的全部目的是减少性能瓶颈时。

此外,我用普通的 dalli 和 Rails.cache.write 对 1000 个相同值的缓存集对缓存存储进行了基准测试:

heroku run console -a {app-name-redacted}
irb(main):001:0> require 'dalli'
=> false
irb(main):002:0> cache = Dalli::Client.new(ENV["MEMCACHIER_SERVERS"].split(","),
irb(main):003:1*                     {:username => ENV["MEMCACHIER_USERNAME"],
irb(main):004:2*                      :password => ENV["MEMCACHIER_PASSWORD"],
irb(main):005:2*                      :failover => true,
irb(main):006:2*                      :socket_timeout => 1.5,
irb(main):007:2*                      :socket_failure_delay => 0.2
irb(main):008:2>                     })
=> #<Dalli::Client:0x00000006686ce8 @servers=["server-redacted:11211"], @options={:username=>"username-redacted", :password=>"password-redacted", :failover=>true, :socket_timeout=>1.5, :socket_failure_delay=>0.2}, @ring=nil>
irb(main):009:0> require 'benchmark'
=> false
irb(main):010:0> n = 1000
=> 1000
irb(main):011:0> Benchmark.bm do |x|
irb(main):012:1*   x.report { n.times do ; cache.set("foo", "bar") ; end }
irb(main):013:1>   x.report { n.times do ; Rails.cache.write("foo", "bar") ; end }
irb(main):014:1> end
       user     system      total        real
 Dalli::Server#connect server-redacted:11211
Dalli/SASL authenticating as username-redacted
Dalli/SASL: username-redacted
  0.090000   0.050000   0.140000 (  2.066113)

Dalli::Server#connect server-redacted:11211
Dalli/SASL authenticating as username-redacted
Dalli/SASL: username-redacted

  0.100000   0.070000   0.170000 (  2.108364)

使用普通的 dalli cache.set,我们使用 2.066113 秒将 1000 个条目写入缓存,平均cache.set时间为 2.06 毫秒。

使用Rails.cache.write,我们使用 2.108364 秒将 1000 个条目写入缓存,平均Rails.cache.write时间为 2.11 毫秒。

⇒ 问题似乎不在于 memcachier,而在于我们试图存储的数据量。

根据#fetch method 的文档,如果我想将缓存集扔到单独的线程或工作人员中,这似乎不是我想要的方式,因为我无法writeread-不言而喻,我不想异步阅读。

Rails.cache.write在设置键值时,是否可以通过投入工作人员来减少瓶颈?或者,更一般地说,是否有更好的模式来执行此操作,这样我就不会在每次想要执行时阻塞主线程Rails.cache.write

4

2 回答 2

2

There are two factors that would contribute to overall latency under normal circumstances: client side marshalling/compression and network bandwidth.

Dalli mashalls and optionally compresses the data, which could be quite expensive. Here are some benchmarks of Marshalling and compressing a list of random characters (a kind of artificial list of user ids or something like that). In both cases the resulting value is around 200KB. Both benchmarks were run on a Heroku dyno - performance will obviously depend on the CPU and load of the machine:

irb> val = (1..50000).to_a.map! {rand(255).chr}; nil
# a list of 50000 single character strings
irb> Marshal.dump(val).size
275832
# OK, so roughly 200K. How long does it take to perform this operation
# before even starting to talk to MemCachier?
irb> Benchmark.measure { Marshal.dump(val) }
=>   0.040000   0.000000   0.040000 (  0.044568)
# so about 45ms, and this scales roughly linearly with the length of the list.


irb> val = (1..100000).to_a; nil # a list of 100000 integers
irb> Zlib::Deflate.deflate(Marshal.dump(val)).size
177535
# OK, so roughly 200K. How long does it take to perform this operation
irb>  Benchmark.measure { Zlib::Deflate.deflate(Marshal.dump(val)) }
=>   0.140000   0.000000   0.140000 (  0.145672)

So we're basically seeing anywhere from a 40ms to 150ms performance hit just for Marshaling and/or zipping data. Marshalling a String will be much cheaper, while marshalling something like a complex object will be more expensive. Zipping depends on the size of the data, but also on the redundancy of the data. For example, zipping a 1MB string of all "a" characters takes merely about 10ms.

Network bandwidth will play some of a role here, but not a very significant one. MemCachier has a 1MB limit on values, which would take approximately 20ms to transfer to/from MemCachier:

irb(main):036:0> Benchmark.measure { 1000.times { c.set("h", val, 0, :raw => true) } }
=>   0.250000  11.620000  11.870000 ( 21.284664)

This amounts to about 400Mbps (1MB * 8MB/Mb * (1000ms/s / 20ms)), which makes sense. However, for even a relatively large, but still smaller value of 200KB, we'd expect a 5x speedup:

irb(main):039:0> val = "a" * (1024 * 200); val.size
=> 204800
irb(main):040:0> Benchmark.measure { 1000.times { c.set("h", val, 0, :raw => true) } }
=>   0.160000   2.890000   3.050000 (  5.954258)

So, there are several things you might be able to do to get some speedup:

  1. Use a faster marshalling mechanism. For example, using Array#pack("L*") to encode a list of 50,000 32-bit unsigned integers (like in the very first benchmark) into a string of length 200,000 (4 bytes for each integer), takes only 2ms rather than 40ms. Using compression with the same marshalling scheme, to get a similar sized value is also very fast (about 2ms as well), but the compression doesn't do anything useful on random data anymore (Ruby's Marshal produces a fairly redundant String even on a list of random integers).

  2. Use smaller values. This would probably require deep application changes, but if you don't really need the whole list, you should be setting it. For example, the memcache protocol has append and prepend operations. If you are only ever adding new things to a long list, you could use those operations instead.

Finally, as suggested, removing the set/gets from the critical path would prevent any delays from affecting HTTP request latency. You still have to get the data to the worker, so it's important that if you're using something like a work queue, the message you send to the worker should only contain instructions on which data to construct rather than the data itself (or you're in the same hole again, just with a different system). A very lightweight (in terms of coding effort) would be to simply fork a process:

mylist = Student.where(...).all.map!(&:id)
...I need to update memcache with the new list of students...
fork do
  # Have to create a new Dalli client
  client = Dalli::Client.new
  client.set("mylistkey", mylist)
  # this will block for the same time as before, but is running in a separate process
end

I haven't benchmarked a full example, but since you're not execing, and Linux fork is copy-on-write, the overhead of the fork call itself should be minimal. On my machine, it's about 500us (that's micro-seconds not milliseconds).

于 2013-12-24T19:20:21.787 回答
0

使用 Rails.cache.write 预取数据并将数据存储在带有工作进程(例如 Sidekiq)的缓存中是我在大容量中看到的。当然,在速度和你想花的钱之间有一个权衡。想一想:

  • 您的应用程序中最常用的路径(active_students经常访问?);
  • 存储什么(只是 ID 或整个对象或更进一步的链);
  • 如果您可以优化该查询(n + 1?)。

此外,如果您真的需要速度,请考虑使用专用的内存缓存服务,而不是 Heroku 插件。

于 2013-12-22T07:25:30.230 回答