注意:在提出这个问题时,正确的方法是仅获取正文中的标头流prefetch=False。该选项已被重命名为stream并且布尔值被反转,所以你想要stream=True.
原始答案如下。
一旦使用iter_content(),您必须继续使用它;.text间接使用引擎盖下的相同接口(通过.content)。
换句话说,通过使用iter_content()at all,您必须.text手动完成工作:
from requests.compat import chardet
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)
if mime == "text/html":
contents = peek + b''.join(r.iter_content(10 * 1024))
encoding = r.encoding
if encoding is None:
# detect encoding
encoding = chardet.detect(contents)['encoding']
try:
textcontent = str(contents, encoding, errors='replace')
except (LookupError, TypeError):
textcontent = str(contents, errors='replace')
print(textcontent)
假设您使用 Python 3。
另一种方法是提出 2 个请求:
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)
if mime == "text/html":
print(r.requests.get("http://www.december.com/html/demo/hello.html").text)
Python 2 版本:
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)
if mime == "text/html":
contents = peek + ''.join(r.iter_content(10 * 1024))
encoding = r.encoding
if encoding is None:
# detect encoding
encoding = chardet.detect(contents)['encoding']
try:
textcontent = unicode(contents, encoding, errors='replace')
except (LookupError, TypeError):
textcontent = unicode(contents, errors='replace')
print(textcontent)