python-3.6 - 简单的 Salesforce 查询_更多永无止境

Question

我无法理解我的代码发生了什么：

import json
from simple_salesforce import Salesforce, SalesforceLogin

fileCount = 1
saveFilesPath ='<path>/'
fileName = saveFilesPath+'test_file'+str(fileCount)+'.json'
sf = Salesforce(username='<username>', password='<password>', security_token='<token>', domain='test' )

initialQuery = sf.query("SELECT id, name, createddate, lastmodifieddate FROM surveyquestionresponse__c")
nextChunk = initialQuery['nextRecordsUrl']
nextQuery = sf.query_more(nextChunk, True)

print(nextChunk)
print(nextQuery['nextRecordsUrl'])



#with open(fileName, 'w') as outfile :
#    json.dump(initialQuery['records'],outfile)

#while nextQuery['nextRecordsUrl'] is not None :
#    fileCount += 1
#    fileName = saveFilesPath+'test_file'+str(fileCount)+'.json'
#    print(nextQuery['nextRecordsUrl'])
#    with open(fileName, 'w') as outfile :
#        json.dump(nextQuery['records'], outfile)

有两件事发生在我身上。首先是初始查询为下一条记录 url 提供 /services/data/v38.0/query/01gf000000gFYRwAAO-2000，但接下来的 nextQuery 给出 /services/data/v38.0/query/01gf000000gFYRwAAO-4000 这很奇怪它正在改变块的数量。

正在发生的另一件事是下一个块永远不会结束。列出的对象中有大约 95K 行，所以理论上它应该吐出大约 25 个文件 @ 4000 或 48 个文件 @ 2000。由于 AWS 上 lambda 的内存限制和一些文件的大小，我无法使用 Query_All我的对象，所以我必须分段编写文件。我怎样才能让这段代码正常运行？

score 2 · Accepted Answer

...AAO-2000您...AAO-4000注意到的是因为每个都包含nextRecordsUrl获取查询中下一个 2000 条记录批次的代码。所以...AAo-2000获取记录 1-2000（第一个块），并在 json 对象的末尾为您提供获取记录 2001-4000（下一个块）的 url。这在 URL 中通过...AAO-4000符号表示。

我使用以下代码遍历我自己组织中的一系列查询，以捕获查询中的所有数据（总共约 62500 条记录）。我没有遇到永无止境的分块问题。

# Initiate list for returned data
pulls = []

# Pull initial Query
initialQuery = sf.query("SELECT id, createddate, lastmodifieddate FROM Task")

# Append initial query data to pulls
pulls.append({'len':len(initialQuery['records']),'url':initialQuery['nextRecordsUrl']})

# Assign nextChunk with 'nextRecordsUrl' value and re-query with new parameters
nextChunk = initialQuery['nextRecordsUrl']
nextQuery = sf.query_more(nextChunk,True)

# Append nextQuery data to pulls
pulls.append({'len':len(nextQuery['records']),'url':nextQuery['nextRecordsUrl']})

# set up while loop to re-query salesforce until returned
# query does not have a 'nextRecordsUrl' return value
x = True
while x == True:
    try:
        # Query new 'nextREcordsUrl'
        nextQuery = sf.query_more(nextQuery['nextRecordsUrl'],True)

        # append new query to pulls
        pulls.append({'len':len(nextQuery['records']),'url':nextQuery['nextRecordsUrl']})
    except: # This triggers when nextQuery['nextRecordsUrl'] does not exist
        # Append final data to pulls
        pulls.append({'len':len(nextQuery['records']),'url':None}) 

        # set x to False to end loop
        x = False 

# return pulls to view data
pulls

这是一个概念验证代码，应该通过一些修改适用于您的情况。我建议更改pulls.append({'len':len(nextQuery['records']),'url':None})以从查询中附加您需要的任何相关数据，或者只是附加整个 json 对象。然后，您可以在 python 脚本中组合各种 json 对象并将它们导出到单个 json 文件中。让我知道您是否需要额外的支持来使代码适应您的情况。

python-3.6 - 简单的 Salesforce 查询_更多永无止境

1 回答 1

Related

Reference