python - API Endpoint 上莫名其妙的格式化魔法

Question

我正在为Deutsche Bahn 的 Fahrplan OpenData API编写一个包装器。

但是，我似乎无法产生与简单 curl 请求相同的结果，如下所示：

>>>import requests
>>>header = {'Authorization': 'Bearer 36e39957ace6f405a82cfb09522d0a8d'}
>>>departure_data = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/departureBoard/8011160?date=2017-06-30', headers=header)

# Now, using a journey's details id, lets request some journey details from the endpoint
>>>requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header)
<Response [404]>
>>>requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header).request.url
'https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/782334%2F275830%2F795514%2F136979%2F80%3fstation_evaId%3D8098160'

好吧，到目前为止，太糟糕了。如您所见，我正在使用提供给我的数据。现在，通过网站调用端点，它告诉我它运行这个curl命令：

curl -X GET --header "Accept: application/json" --header "Authorization: Bearer 36e39957ace6f405a82cfb09522d0a8d" "https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/782334%252F275830%252F795514%252F136979%252F80%253fstation_evaId%253D8098160"

这一点神奇的事情发生了：

原始旅程 ID

'782334%2F275830%2F795514%2F136979%2F80%3fstation_evaId%3D8098160'

变成：

'782334%252F275830%252F795514%252F136979%252F80%253fstation_evaId%253D8098160'

并返回一个状态200。

出乎意料的是，旅程 ID 中添加了一些字符。我将它复制并粘贴到给定的字段中，仅此而已，所以我知道这不是我。

我相信发生了某种编码/解码，但我以前从未见过这种情况，老实说，我不知道该怎么做。

我如何在我的代码中处理这个？显然，除了简单地解析departures端点之外，我还需要做一些事情吗？或者，更好的是，我只是错过了一些明显的东西吗？

我已经向数据库开发人员发送了多封邮件，但到目前为止还没有收到他们的回复。

score 1 · Accepted Answer

您看到的是双 URL 编码。百分号%使用相应的序列进行 URL 编码%25：

/ -> %2F -> %252F

departure_data.json()[0]['detailsId']在执行以下操作之前尝试 urldecode

>>> requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + departure_data.json()[0]['detailsId'], headers=header)

例如像这样

requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + urllib.unquote(urllib.unquote(departure_data.json()[0]['detailsId'])), headers=header)

score 1 · Accepted Answer

在API v1 中，定义了四个端点：

获取/位置/{名称}
获取 /arrivalBoard/{id}
获取 /departureBoard/{id}
获取 /journeyDetails/{id}

他们每个人都需要一个{id}参数。你给这个参数的值必须是 URL 编码的，这是你忽略的。

/departureBoard/{id}为您提供Board项目列表，其定义如下：

Board {
    name (string): ,
    type (string): ,
    boardId (string): ,
    stopId (string): ,
    stopName (string): ,
    dateTime (string): ,
    origin (string): ,
    track (string): ,
    detailsId (string):
}

这detailsId是您可以用来达到/journeyDetails/{id}端点的东西。所以最小的工作代码看起来像这样（注意对的调用urllib.parse.quote）：

import requests
import urllib

header = {'Authorization': 'Bearer 36e39957ace6f405a82cfb09522d0a8d'}
departure_data = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/departureBoard/8011160?date=2017-06-30', headers=header)

journey_id = departure_data.json()[0]['detailsId']
journey_details = requests.get('https://api.deutschebahn.com/fahrplan-plus/v1/journeyDetails/' + urllib.parse.quote(journey_id), headers=header)

的值journey_id本身是 URL 编码的，并解码为类似于 URL 片段的内容：

urllib.parse.unquote(journey_id)
# -> '564552/203236/867650/245641/80?station_evaId=8098160'

所以看起来有点像你可以简单地使用原始值来提出进一步的请求，但这是一个误解。

将 ID 视为需要编码的不透明纯文本值，就像在 URL 中使用它之前对任何其他任意值进行编码一样。

当您引用该值时，百分号由转义%25，这导致更长的值：

'564552%2F203236%2F867650%2F245641%2F80%3fstation_evaId%3D8098160'
'564552%252F203236%252F867650%252F245641%252F80%253fstation_evaId%253D8098160'

由于 Deutsche Bahn API 是通过Swagger进行自我记录的，因此安装一个 swagger 客户端让它为您创建一个 API 包装器可能是最简单的（参见他们的 swagger.json）。pyswagger看起来可用，但还有其他可以尝试。

通过这种方式，您可以专注于发出 API 请求和获取数据，并且 URL 编码甚至授权等低级管道将在后台透明地发生。

python - API Endpoint 上莫名其妙的格式化魔法

2 回答 2

Related

Reference