python - 无法使用 Python 循环遍历分页 API 响应

Question

所以，我正在用这个挠头。使用 HubSpot 的 API，我需要获取客户“门户”（帐户）中所有公司的列表。遗憾的是，标准 API 调用一次只返回 100 家公司。当它确实返回响应时，它包含两个参数，这些参数使对响应进行分页成为可能。

其中一个是"has-more": True（这让您知道是否可以期待更多页面），另一个是"offset":12345678（抵消请求的时间戳。）

这两个参数是您可以传递回下一个 API 调用以获取下一页的内容。例如，初始 API 调用可能如下所示：

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)

而后续电话可能如下所示：

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)

所以这是我到目前为止所尝试的：

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import os.path
import requests
import json
import csv
import glob2
import shutil
import time
import time as howLong
from time import sleep
from time import gmtime, strftime

HubSpot_Customer_Portal_ID = "XXXXXX"

wta_hubspot_api_key = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

findCSV = glob2.glob('*contact*.csv')

theDate = time=strftime("%Y-%m-%d", gmtime())
theTime = time=strftime("%H:%M:%S", gmtime())

try:
    testData = findCSV[0]
except IndexError:
    print ("\nSyncronisation attempted on {date} at {time}: There are no \"contact\" CSVs, please upload one and try again.\n").format(date=theDate, time=theTime)
    print("====================================================================================================================\n")
    sys.exit()

for theCSV in findCSV:

    def get_companies():
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
        headers = {'content-type': 'application/json'}
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
        if create_get_recent_companies_response.status_code == 200:

            offset = create_get_recent_companies_response.json()[u'offset']
            hasMore = create_get_recent_companies_response.json()[u'has-more']

            while hasMore == True:
                for i in create_get_recent_companies_response.json()[u'companies']:
                    get_more_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                    get_more_companies_call_response = requests.get(get_more_companies_call, headers=headers)
                    companyName = i[u'properties'][u'name'][u'value']
                    print("{companyName}".format(companyName=companyName))


        else:
            print("Something went wrong, check the supplied field values.\n")
            print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

    if __name__ == "__main__":
        get_companies()
        sys.exit()

问题是它只是不断返回相同的初始 100 个结果；发生这种情况是因为参数"has-more":True在初始调用时为真，所以它只会继续返回相同的参数......

我的理想方案是我能够解析大约 120 个响应页面中的所有公司（大约有 12000 家公司）。当我通过每个页面时，我想将它的 JSON 内容附加到一个列表中，这样最终我就有了这个列表，其中包含所有 120 个页面的 JSON 响应，以便我可以解析该列表以用于不同的功能.

我迫切需要一个解决方案:(

这是我在主脚本中替换的函数：

            def get_companies():

                create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/recent/modified?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
                headers = {'content-type': 'application/json'}
                create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
                if create_get_recent_companies_response.status_code == 200:

                    for i in create_get_recent_companies_response.json()[u'results']:
                        company_name = i[u'properties'][u'name'][u'value']
                        #print(company_name)
                        if row[0].lower() == str(company_name).lower():
                            contact_company_id = i[u'companyId']
                            #print(contact_company_id)
                            return contact_company_id
                else:
                    print("Something went wrong, check the supplied field values.\n")
                    #print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

score 1 · Accepted Answer

问题似乎是：

您在第一次调用中获得了偏移量，但不对调用返回的实际公司数据做任何事情。
然后，您在 while 循环中使用相同的偏移量；您永远不会在后续调用中使用新的。这就是为什么你每次都得到相同的公司。

我认为这段代码get_companies()应该适合你。显然，我无法测试它，但希望它没问题：

def get_companies():
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
        headers = {'content-type': 'application/json'}
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
        if create_get_recent_companies_response.status_code == 200:

            while True:
                for i in create_get_recent_companies_response.json()[u'companies']:
                    companyName = i[u'properties'][u'name'][u'value']
                    print("{companyName}".format(companyName=companyName))
                offset = create_get_recent_companies_response.json()[u'offset']
                hasMore = create_get_recent_companies_response.json()[u'has-more']
                if not hasMore:
                    break
                else:
                    create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                    create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)


        else:
            print("Something went wrong, check the supplied field values.\n")
            print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

严格来说，else后面break不是必需的，但它符合Python 的禅宗“显式胜于隐式”

请注意，您只检查一次 200 响应代码，如果您的循环内出现问题，您将错过它。您可能应该将所有调用都放入循环中，并每次检查是否有正确的响应。

python - 无法使用 Python 循环遍历分页 API 响应

1 回答 1

Related

Reference