google-app-engine - 合并bulkloader中的多个列

Question

我正在使用应用程序引擎bulkloader将 CSV 文件导入我的数据存储区。我有许多列要合并为一个列，例如它们都是 URL，但并非所有列都提供，并且有一个取代顺序，例如：

url_main
url_temp
url_test

我想说：“好的，如果url_main存在，使用它，否则用户url_test然后使用url_temp”

因此，是否可以创建一个自定义导入转换来引用列并根据条件将它们合并为一个？

score 2 · Accepted Answer

好的，所以在阅读https://developers.google.com/appengine/docs/python/tools/uploadingdata#Configuring_the_Bulk_Loader之后，我了解到import_transform这可以使用自定义函数。

考虑到这一点，这为我指出了正确的方法：

... 一个带有关键字参数 bulkload_state 的双参数函数，返回时包含有关实体的有用信息：bulkload_state.current_entity，这是正在处理的当前实体；bulkload_state.current_dictionary，当前导出字典...

所以，我创建了一个处理两个变量的函数，一个是value当前实体的，第二个是bulkload_state允许我获取当前行的，如下所示：

def check_url(value, bulkload_state):
    row = bulkload_state.current_dictionary
    fields = [ 'Final URL', 'URL', 'Temporary URL' ]

    for field in fields:
        if field in row:
            return row[ field ]


    return None

所有这一切都是抓取当前行 ( bulkload_state.current_dictionary)，然后检查存在哪些 URL 字段，否则它只是返回None。

在我的bulkloader.yaml我只是通过设置来调用这个函数：

- property: business_url
  external_name: URL
  import_transform: bulkloader_helper.check_url

注意：external_name没关系，只要它存在，因为我实际上没有使用它，我正在使用多个列。

简单！

google-app-engine - 合并bulkloader中的多个列

1 回答 1

Related

Reference