python - Python sock.recv 没有从页面获取所有数据

Question

这对我学习如何进行低级套接字通信来说是非常困难的一步，但我真的很想学习这个，我遇到了困难，我似乎无法找到正确的方式。

我怎样才能获得所有数据？我已经尝试了多种方法，但只能得到部分响应。

我现在正在尝试的 URL 是：

http://steamcommunity.com/market/search/render/?query=&start=0&count=100&search_descriptions=0&sort_column=price&sort_dir=asc&appid=730&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Rarity%5B%5D=tag_Rarity_Ancient_Weapon

经过研究，我尝试了这种方式，但仍然无法打印上面的完整 JSON 页面，我做错了什么吗？

        sock.send(request)
        response = ""
        first = True
        length = 0
        while True:
            partialResponse = sock.recv(65536)
            if len(partialResponse) != 0:
                #print("all %s" % partialResponse)
                # Extract content length from the first chunk
                if first:
                    startPosition = partialResponse.find("Content-Length")
                    if startPosition != -1:
                        endPosition = partialResponse.find("\r\n", startPosition+1)
                        length = int(partialResponse[startPosition:endPosition].split(" ")[1])
                    first = False
                # add current chunk to entire content
                response += partialResponse
                # remove chunksize from chunck
                startPosition = response.find("\n0000")
                if startPosition != -1:
                    endPosition = response.find("\n", startPosition+1)
                    response = response[0:startPosition-1] + response[endPosition+1:]
                if len(response[response.find("\r\n\r\n")+2:]) > length:
                    break
            else:
                break
        print response

score 3 · Accepted Answer

我能够复制该问题，并且似乎服务器没有返回内容长度标头，导致if len(response[..]) > length触发长度为 0。将该语句更改为if length > 0 and ...似乎可以解决它。

我必须将设置的超时时间从 0.3 秒增加到 0.5 秒，以便始终如一地获得响应。

我在 Chrome 中收到了内容长度，但可能是因为内容编码是 gzip。我猜他们不会为未压缩的响应发送内容长度。

本文档的 Content-Length 部分将标头列为“应该”。

其他一般建议：我不会假设第一个块将始终包含所有标题。真的不应该打开“第一”。您可能应该阅读，直到您遇到\r\n\r\n指示标头完成并分别处理该标头完成的所有内容作为响应正文。

根据评论编辑：

对于快速而肮脏的事情，我可能会这样做：

response = ''
while True:
    chunk = sock.recv(65536)

    if len(chunk) == 0:
      break
    else:
      response += chunk

pieces = response.split('\r\n\r\n')

headers = pieces[0]
body = '\r\n\r\n'.join(pieces[1:])

print response
print body
print headers

print len(response), len(body), len(headers)

只需将套接字接收到的所有内容都撕成一个字符串，并且根本不要尝试解释它。这将为您提供获得一切的最佳机会。

我绝对认为在这个级别上玩是一种很好的学习方式，并且完全值得每一刻。话虽如此，图书馆通常是这类事情的首选是有原因的。

HTTP 确实不能保证很多东西——它非常灵活并且有很多变量。因此，您需要从基本上没有期望/要求开始，并仔细建立不断思考“如果这个/那个”的想法。需要注意的一件事是分块可以在任何地方发生。一个块可能会在标头完成之前中断，它甚至可能在一个\r和之间中断，\n这意味着您需要跨块解析以检测边界。对于常见用法，将整个响应读入内存可能不是问题，但当然，某些响应或其他要求可能会导致不切实际/不可能。

python - Python sock.recv 没有从页面获取所有数据

1 回答 1

Related

Reference