python - 数据存储和内存缓存操作 (GAE) 的高度可变性能

Question

我正在尝试优化 GAE 的性能，但是一旦部署，我会得到非常不稳定的结果。很难看出每个优化是否真的有效，因为数据存储和内存缓存操作需要一个非常可变的时间（对于相同的操作，它的范围从几毫秒到几秒）。

对于这些测试，我是唯一一个通过刷新主页在应用程序上只发出一个请求的人。没有其他人/流量发生（除了我自己的浏览器从页面请求图像/css/js 文件）。

编辑：为了确保丢弃不是由于来自浏览器的并发请求（图像/css/js），我通过仅请求带有 urllib2.urlopen() 的页面来重做测试。问题仍然存在。

我的问题是：

1) 由于机器/资源是共享的，这是否值得期待？
2) 这种行为最常见的情况是什么？
3）我可以从那里去哪里？

这是一个非常缓慢的数据存储获取（memcache 刚刚刷新）：超慢数据存储获取全尺寸

这是一个非常慢的 memcache 获取（由于之前的请求，东西被缓存了）：缓慢的 memcache 获取全尺寸

这是一个缓慢但更快的 memcache 获取（与前一个相同的复制步骤，不同的调用很慢）：在此处输入图像描述全尺寸

score 0 · Accepted Answer

To answer your questions,

1) yes, you can expect variance in remote calls because of the shared network;

2) the most common place you will see variance is in datastore requests -- the larger/further the request, the more variance you will see;

3) here are some options for you:

It looks like you are trying to fetch large amounts of data from the datastore/memcache. You may want to re-think the queries and caches so they retrieve smaller chunks of data. Does your app need all that data for a single request?

If the app really needs to process all that data on every request, another option is to preprocess it with a background task (cron, task queue, etc.) and put the results into memcache. The request that serves up the page should simply pick the right pieces out of the memcache and assemble the page.

@proppy's suggestion to use NDB is a good one. It takes some work to rewrite serial queries into parallel ones, but the savings from async calls can be huge. If you can benefit from parallel tasks (using map), all the better.

python - 数据存储和内存缓存操作 (GAE) 的高度可变性能

1 回答 1

Related

Reference