python - App Engine 批量加载程序性能

Question

我正在使用 App Engine 批量加载程序（Python 运行时）将实体批量上传到数据存储区。我上传的数据以专有格式存储，因此我通过自己的连接器（在中注册bulkload_config.py）实现了将其转换为中间 python 字典。

import google.appengine.ext.bulkload import connector_interface
class MyCustomConnector(connector_interface.ConnectorInterface):
   ....
   #Overridden method
   def generate_import_record(self, filename, bulkload_state=None):
      ....
      yeild my_custom_dict

要将这个中性 python 字典转换为数据存储实体，我使用了我在 YAML 中定义的自定义后导入函数。

def feature_post_import(input_dict, entity_instance, bulkload_state):
    ....
    return [all_entities_to_put]

注意：我没有entity_instance, bulkload_state在我的feature_post_import函数中使用。我只是创建新的数据存储实体（基于我的input_dict），然后返回它们。

现在，一切都很好。但是，批量加载数据的过程似乎花费了太多时间。例如，1 GB（约 1,000,000 个实体）的数据需要约 20 小时。如何提高批量加载过程的性能。我错过了什么吗？

我与 appcfg.py 一起使用的一些参数是（10 个线程，每个线程的批量大小为 10 个实体）。

链接了 Google App Engine Python 小组帖子：http ://groups.google.com/group/google-appengine-python/browse_thread/thread/4c8def071a86c840

更新：为了测试批量加载过程的性能，我加载entities了一个 'Test' Kind。尽管这entity有一个非常简单的FloatProperty，我仍然花了同样多的时间来批量加载这些entities。

我仍然会尝试改变批量加载程序参数，rps_limit和bandwidth_limit，http_limit看看我是否可以获得更多的吞吐量。

score 4 · Accepted Answer

有一个参数调用rps_limit确定每秒上传的实体数量。这是主要的瓶颈。默认值为20.

也增加bandwidth_limit一些合理的东西。

我增加rps_limit了500，一切都改善了。我实现了每 1000 个实体 5.5 - 6 秒，这是每 1000 个实体 50 秒的重大改进。

python - App Engine 批量加载程序性能

1 回答 1

Related

Reference