python - 从 Zendesk API 获取数据时，为什么使用 ProtocolError('Connection aborted.', BadStatusLine("''",)) 出现错误的 HTTP 状态？

Question

我试图user identities从 Zendesk API 获取几十万user id，使用Python 3.4.3和requests库。它适用于许多用户 ID，然后我的程序收到来自 Zendesk API 的错误响应。

下面是相关的 Python 函数：

def get_user_identities(user_id):
  url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'

  session = requests.Session()
  session.auth = config.credentials

  response = ''

  while True:
    try:
      response = session.get(url)
    except requests.ConnectionError as error:
      logger.error("ConnectionError: {0}".format(error))
      num_seconds = 30
      logger.info("Sleeping for {} seconds...".format(num_seconds))
      time.sleep(num_seconds)
    else:
      break

  while True:
    response = session.get(url)
    if response.status_code == 429:
      logger.info('Rate limited! Waiting for {} seconds'.format(response.headers['retry-after']))
      time.sleep(int(response.headers['retry-after']))
    else:
      break

  if response.status_code != 200:
    logger.error('Error with status code {}'.format(response.status_code))
    exit()

  data = response.json()

此函数在循环中调用，可以毫无问题user identity地检索数千个用户，但由于HTTP 响应状态不佳而退出：

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 595, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 393, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.4/http/client.py", line 1171, in getresponse
    response.begin()
  File "/usr/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 330, in send
    timeout=timeout
  File "/usr/local/lib/python3.4/dist-packages/urllib3/connectionpool.py", line 640, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.4/dist-packages/urllib3/util/retry.py", line 287, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 72, in <module>
    get_user_identities(user_id)
  File "/home/emre.sevinc/code/company-zendesk/get_user_identities.py", line 42, in get_user_identities
    response = session.get(url)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 467, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 455, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 558, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 378, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='companyname.zendesk.com', port=443): Max retries exceeded with url: /api/v2/users/1608220001/identities.json (Caused by ProtocolError('Connection aborted.', BadStatusLine("''",)))

但是当我使用HTTPie测试相同的 URL 以获取用户身份时，它工作得很好：

$ http -a user@company.com:password https://companyname.zendesk.com/api/v2/users/1608220001/identities.json

HTTP/1.1 200 OK
Cache-Control: must-revalidate, private, max-age=0
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json; charset=UTF-8
Date: Tue, 12 Sep 2017 15:11:39 GMT
ETag: W/"8135d41f9068e1c2b45d0f307c6431d4"
Last-Modified: Mon, 09 Nov 2015 20:55:44 GMT
Server: nginx
Strict-Transport-Security: max-age=31536000;
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Rack-Cache: miss
X-Rate-Limit: 700
X-Rate-Limit-Remaining: 416
X-Request-Id: f1320883-caf0-4d33-cd94-a0369f4368f8
X-Runtime: 0.381444
X-UA-Compatible: IE=Edge,chrome=1
X-Zendesk-API-Version: v2
X-Zendesk-Application-Version: v40.20
X-Zendesk-Origin-Server: app15.pod3.dub1.zdsys.com
X-Zendesk-Request-Id: a0606a3ae1d043968f53

{
    "count": 1, 
    "identities": [
        {
            "created_at": "2015-11-09T20:55:44Z", 
            "deliverable_state": "deliverable", 
            "id": 1020870341, 
            "primary": true, 
            "type": "email", 
...

Zendesk REST API 端点是否“认为”我试图“刮掉”它并故意断开连接？正如https://stackoverflow.com/a/33226080/236007所建议的那样？

或者是别的什么，你有什么建议让它发挥作用吗？（除了伪造用户代理？）

score 0 · Accepted Answer

显然，代码必须再捕获一个异常，urllib3.exceptions.MaxRetryError以及一个 HTTP 状态代码 ( BAD_GATEWAY_ERROR = 502)，才能解决 Zendesk REST API 端点向其抛出的问题：

BAD_GATEWAY_ERROR = 502
RATE_LIMITED_ERROR = 429
MAX_NUM_SECONDS_TO_SLEEP = 30
MAX_NUM_OF_ALLOWED_RETRIES = 10


def get_user_identities(user_id):
  url = config.zendesk_api_url + '/api/v2/users/' + user_id + '/identities.json'

  session = requests.Session()
  session.auth = config.credentials

  script_path = get_script_path()

  num_retries = 0
  response = ''

  while True:
    if num_retries > MAX_NUM_OF_ALLOWED_RETRIES:
      logger.error('Tried more than {} times without success. Skipping the user id {} .'
                   .format(MAX_NUM_OF_ALLOWED_RETRIES, user_id))
      return

    try:
      response = session.get(url)

      if response.status_code == RATE_LIMITED_ERROR:
        logger.info('Rate limited! Waiting for {} seconds and will try again.'
                    .format(response.headers['retry-after']))
        time.sleep(int(response.headers['retry-after']))
        num_retries += 1
        continue

      if response.status_code == BAD_GATEWAY_ERROR:
        logger.info('Bad Gateway Error. Waiting for {} seconds and will try again.'
                    .format(str(MAX_NUM_SECONDS_TO_SLEEP)))
        time.sleep(MAX_NUM_SECONDS_TO_SLEEP)
        num_retries += 1
        continue

      if response.status_code != 200:
        logger.error('Error with status code {}. Skipping the user id {}'
                     .format(response.status_code, user_id))
        return

    except (requests.ConnectionError, urllib3.exceptions.MaxRetryError) as error:
      logger.error("ConnectionError: {0}".format(error))
      logger.info("Sleeping for {} seconds...".format(MAX_NUM_SECONDS_TO_SLEEP))
      time.sleep(MAX_NUM_SECONDS_TO_SLEEP)
      num_retries += 1
    else:
      break

  data = response.json()

经过上述更改后，它能够从 Zendesk REST API 端点成功检索超过 700.000 条记录。

我遇到的问题类似于 Zendesk 服务器在这种情况下的行为。

python - 从 Zendesk API 获取数据时，为什么使用 ProtocolError('Connection aborted.', BadStatusLine("''",)) 出现错误的 HTTP 状态？

1 回答 1

Related

Reference