python - 为什么我的代码显示为杂乱无章，而事实并非如此？

Question

class sss(webapp.RequestHandler):
  def get(self):
    url = "http://www.google.com/"
    result = urlfetch.fetch(url)    
    if result.status_code == 200:
        self.response.out.write(result.content)

当我将代码更改为此：

if result.status_code == 200:
        self.response.out.write(result.content.decode('utf-8').encode('gb2312'))

它显示了一些奇怪的东西。我应该怎么办？

当我使用这个时：

self.response.out.write(result.content.decode('big5'))

该页面与我在 Google.com 看到的页面不同。

如何获取我看到的 Google.com？

score 3 · Accepted Answer

Google 可能正在为您提供 ISO-8859-1。至少，这就是他们为用户代理“AppEngine-Google；（+ http://code.google.com/appengine）”（urlfetch使用）为我服务的。Content-Type 标头值为：

text/html; charset=ISO-8859-1

所以你会使用：

result.content.decode('ISO-8859-1')

如果您选中result.headers["Content-Type"]，您的代码可以适应另一端的更改。您通常可以将字符集（在本例中为 ISO-8859-1）直接传递给 Python 解码方法。

score 1 · Accepted Answer

how to get google.com that i saw ?

It's probably using relative URLs to images, javascript, CSS, etc, that you're not changing into absolute URLs into google's site. To confirm this: your logs should be showing 404 errors ("page not found") as the browser to which you're serving "just the HTML" tries locating the relative-addressed resources that you're not supplying.

python - 为什么我的代码显示为杂乱无章，而事实并非如此？

2 回答 2

Related

Reference