python - 输出网站源代码的python套接字客户端，为什么不起作用？

Question

以下代码不输出任何内容（为什么？）。

#!/usr/bin/python           
import socket             

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)                 

s.connect(("www.python.org" , 80))
print s.recv(4096)
s.close()

为了输出 python 网站的源代码，我必须进行哪些更改，就像您view source在浏览器中看到的那样？

score 12 · Accepted Answer

HTTP是请求/响应协议。您没有发送任何请求，因此您没有得到任何响应。

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)                 

s.connect(("www.python.org" , 80))
s.sendall("GET /\r\n") # you're missing this line
print s.recv(4096)
s.close()

当然，这将执行最原始的 HTTP/1.0 请求，而无需处理 HTTP 错误、HTTP 重定向等。我不建议将它用于实际使用，而只是作为练习来熟悉套接字编程和 HTTP。

对于 HTTP，Python 提供了一些内置模块：httplib（低级）、urllib和urllib2（高级）。

score 3 · Accepted Answer

除非您在请求中使用完整的 URL，否则您将获得重定向 (302)。

试试这个：

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)                 
s.connect(("www.python.org" , 80))
s.sendall("GET http://www.python.org HTTP/1.0\n\n")
print s.recv(4096)
s.close()

当然，如果您只想要 URL 的内容，这要简单得多。:)

print urllib2.urlopen('http://www.python.org').read()

score 0 · Accepted Answer

我得到了html

    def steal_html():
        url='https://some_website.org'
        with open('index.html', 'w') as FILE:
            html = requests.get(url).text
            FILE.write(html)

python - 输出网站源代码的python套接字客户端，为什么不起作用？

3 回答 3

Related

Reference