我正在尝试抓取一个网站以获取有关车辆的信息。我想从该站点获取所有车辆。我想每天重复这个过程,因为每天都有新车。
有很多汽车,超过10万辆。因此,做一次(在一个过程中)会花费太多时间,而且不能以这种方式完成。
因此,我需要在更小的流程中而不是在一个大流程中进行。
如果我理解正确,可以使用IBM Cloud 功能来完成。
例如,我可以为每个品牌以及该品牌的每个模型调用一个操作来获取汽车列表。
这样一来,我将拥有(而不是一个大流程)许多较小的流程,并且花费的时间更少。
思路如下:
- 调用将获取所有内容
makes
并循环遍历它们的操作。对于每一个品牌,首先创造和行动,然后调用它
代码如下:
import sys
import os
import json
import requests
import http.client
import uuid
API_URL = "https://url.com"
APIHOST = os.environ.get('__OW_API_HOST')
NAMESPACE = os.environ.get('__OW_NAMESPACE')
USER_PASS = os.environ.get('__OW_API_KEY').split(':')
code = "New function code"
makes = [
{"id": 9,"name": "Audi"},
{"id": 74,"name": "Volkswagen"}
]
def main(dict):
conn = http.client.HTTPSConnection("openwhisk.eu-gb.bluemix.net")
payload = json.dumps({"exec": {"kind": "python-jessie:3", "code": code}})
headers = {
'accept': "application/json",
'content-type': "application/json",
'Authorization': "Basic my-base64key"
}
for make in makes:
action = 'models-{0}'.format(make['name'])
url = APIHOST + '/api/v1/namespaces/' + NAMESPACE + '/actions/' + action + "?overwrite=true"
conn.request("PUT", url, payload, headers) // Create new action
// Execute the new action
return {"Success": "Main executed correctly."}
问题在for
循环中。如果只有一个品牌,那么它工作正常。但如果有两个或更多,它就不起作用。我收到如下错误:
[
"2018-07-11T08:53:06.322665342Z stderr: Traceback (most recent call last):",
"2018-07-11T08:53:06.322685254Z stderr: File \"pythonrunner.py\", line 88, in run",
"2018-07-11T08:53:06.322692936Z stderr: exec('fun = %s(param)' % self.mainFn, self.global_context)",
"2018-07-11T08:53:06.322699124Z stderr: File \"<string>\", line 1, in <module>",
"2018-07-11T08:53:06.322705761Z stderr: File \"__main__.py\", line 71, in main",
"2018-07-11T08:53:06.322712082Z stderr: File \"/usr/local/lib/python3.6/http/client.py\", line 1239, in request",
"2018-07-11T08:53:06.322718524Z stderr: self._send_request(method, url, body, headers, encode_chunked)",
"2018-07-11T08:53:06.322724518Z stderr: File \"/usr/local/lib/python3.6/http/client.py\", line 1250, in _send_request",
"2018-07-11T08:53:06.322730924Z stderr: self.putrequest(method, url, **skips)",
"2018-07-11T08:53:06.322736931Z stderr: File \"/usr/local/lib/python3.6/http/client.py\", line 1108, in putrequest",
"2018-07-11T08:53:06.322742876Z stderr: raise CannotSendRequest(self.__state)",
"2018-07-11T08:53:06.322748626Z stderr: http.client.CannotSendRequest: Request-sent"
]
如果有两条或更多记录,知道如何在 for 循环中执行这些请求吗?