python - Python - 如何通过分页 API 循环提取数据（Harvest）

Question

首先，我已经使用 Python 大约几天了，所以我不一定知道最佳实践或所有术语......但是。我通过逆向工程学得最好，下面的代码基于 Harvest 的官方文档和我在 google-fu 中找到的其他内容

我的要求是从 Harvest 下载所有时间条目记录并保存为 JSON（或理想情况下为 CSV 文件）。

这是我改编的代码（包括所有输出，这在最终代码中不是必需的，但对我的学习很方便）：

import requests, json, urllib.request

#Set variables for authorisation
AUTH = "REDACTED"
ACCOUNT = "REDACTED"

URL = "https://api.harvestapp.com/v2/time_entries"
HEADERS = { "Authorization": AUTH,
            "Harvest-Account-ID": ACCOUNT}
PAGENO = str("5")

request = urllib.request.Request(url=URL+"?page="+PAGENO, headers=HEADERS)
response = urllib.request.urlopen(request, timeout=5)
responseBody = response.read().decode("utf-8")
jsonResponse = json.loads(responseBody)

# Find the values for pagination
parsed = json.loads(responseBody)
links_first = parsed["links"]["first"]
links_last = parsed["links"]["last"]
links_next = parsed["links"]["next"]
links_previous = parsed["links"]["previous"]
nextpage = parsed["next_page"]
page = parsed["page"]
perpage = parsed["per_page"]
prevpage = parsed["previous_page"]
totalentries = parsed["total_entries"]
totalpages = parsed["total_pages"]

#Print the output
print(json.dumps(jsonResponse, sort_keys=True, indent=4))
print("first link : " + links_first)
print("last link : " + links_last)
print("next page : " + str(nextpage))
print("page : " + str(page))
print("per page : " + str(perpage))
print("total records : " + str(totalentries))
print("total pages : " + str(totalpages))

输出响应是
“压缩文本（5816 行）”
第一个链接：https
://api.harvestapp.com/v2/time_entries?page=1&per_page=100最后一个链接：https ://api.harvestapp.com/v2/time_entries ?page=379&per_page=100
下一页 : 6
页 : 5
每页 : 100
总记录 : 37874
总页数 : 379

请有人建议循环浏览页面以形成一个 JSON 文件的最佳方法吗？如果您还能够建议最好的方法，那么输出该 JSON 文件，我将不胜感激。

score 4 · Accepted Answer

我一直在使用以下代码来检索所有时间条目。也许它可能会更有效一些，但它确实有效。get_all_time_entries 函数循环遍历所有页面，并将 JSON 格式的响应附加到 all_time_entries 数组中，最后返回该数组。

import requests
import json

def get_all_time_entries():

    url_address = "https://api.harvestapp.com/v2/time_entries"  
    headers = {
        "Authorization": "Bearer " + "xxxxxxxxxx",
        "Harvest-Account-ID": "xxxxxx"
    }

    # find out total number of pages
    r = requests.get(url=url_address, headers=headers).json()
    total_pages = int(r['total_pages'])

    # results will be appended to this list
    all_time_entries = []

    # loop through all pages and return JSON object
    for page in range(1, total_pages):

        url = "https://api.harvestapp.com/v2/time_entries?page="+str(page)              
        response = requests.get(url=url, headers=headers).json()        
        all_time_entries.append(response)       
        page += 1

    # prettify JSON
    data = json.dumps(all_time_entries, sort_keys=True, indent=4)

    return data

print(get_all_time_entries())

在 powershell 等中运行时，您可以使用“>”轻松地将脚本的输出定向到本地文件夹等。

例如：

Python.exe 示例.py > C:\temp\all_time_entries.json

希望这可以帮助！

score 1 · Accepted Answer

有一个支持 Harvest API v2 的 Python 库。

该库支持所有身份验证方法、请求速率限制、响应代码，并为每个响应对象提供数据类。

该库经过了很好的测试，因此您将在测试中为每个端点提供一个使用示例。测试使用官方 Harvest 示例。

此外，还有一个继承 Harvest 对象的详细时间报告示例。详细时间报告的测试显示了如何使用它。

该库引用自 Harvest 软件目录； https://www.getharvest.com/integrations/python-library

项目网址； https://github.com/bradbase/python-harvest_apiv2

我拥有这个项目。


from harvest import Harvest
from .harvestdataclasses import *

class MyTimeEntries(Harvest):

    def __init__(self, uri, auth):
        super().__init__(uri, auth)


    def time_entries(self):
        time_entry_results = []
       
        time_entries = self.time_entries()
        time_entry_results.extend(time_entries.time_entries)
        if time_entries.total_pages > 1:
            for page in range(2, time_entries.total_pages + 1):
                time_entries = self.time_entries(page=page)
                time_entry_results.extend(time_entries.time_entries)
        
        return time_entry_results

personal_access_token = PersonalAccessToken('ACCOUNT_NUMBER', 'PERSONAL_ACCESS_TOKEN')
my_report = MyTimeEntries('https://api.harvestapp.com/api/v2', personal_access_token)
time_entries = my_report.time_entries()

python - Python - 如何通过分页 API 循环提取数据（Harvest）

2 回答 2

Related

Reference