108

当我将它粘贴到浏览器上时,以下工作:

http://www.somesite.com/details.pl?urn=2344

但是当我尝试用 Python 读取 URL 时,什么也没有发生:

 link = 'http://www.somesite.com/details.pl?urn=2344'
 f = urllib.urlopen(link)           
 myfile = f.readline()  
 print myfile

我需要对 URL 进行编码,还是有什么我看不到的东西?

4

10 回答 10

184

要回答您的问题:

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)

你需要read(),而不是readline()

编辑(2018-06-25):从 Python 3 开始,旧版本urllib.urlopen()被替换为(有关详细信息,请参阅https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopenurllib.request.urlopen()的注释) .

如果您使用的是 Python 3,请参阅 Martin Thoma 或 innm 在此问题中的回答: https ://stackoverflow.com/a/28040508/158111 (Python 2/3 兼容) https://stackoverflow.com/a/45886824 /158111(Python 3)

或者,只需在此处获取此库:http: //docs.python-requests.org/en/latest/并认真使用它:)

import requests

link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)
于 2013-02-28T14:59:55.040 回答
34

对于python3用户,为了节省时间,请使用以下代码,

from urllib.request import urlopen

link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"

f = urlopen(link)
myfile = f.read()
print(myfile)

我知道 error: 有不同的线程Name Error: urlopen is not defined,但认为这可能会节省时间。

于 2017-08-25T17:38:01.280 回答
19

这些答案都不是非常适合 Python 3(在本文发布时已在最新版本上测试)。

这就是你的做法...

import urllib.request

try:
   with urllib.request.urlopen('http://www.python.org/') as f:
      print(f.read().decode('utf-8'))
except urllib.error.URLError as e:
   print(e.reason)

以上是返回“utf-8”的内容。如果您希望python“猜测适当的编码”,请删除 .decode('utf-8') 。

文档: https ://docs.python.org/3/library/urllib.request.html#module-urllib.request

于 2019-05-24T14:50:18.127 回答
10

与 Python 2.X 和 Python 3.X 一起使用的解决方案利用了 Python 2 和 3 兼容性库six

from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)
于 2015-01-20T08:17:55.973 回答
1

我们可以读取网站 html 内容如下:

from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)
于 2018-03-08T09:21:12.743 回答
0
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib import urlopen
with urlopen('https://www.facebook.com/') as \
    url:
    data = url.read()

print data

# When the server does not know where the request is coming from.
# Works on python 3.

import urllib.request

user_agent = \
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}

request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data
于 2019-08-24T07:14:58.337 回答
-1

URL 应该是一个字符串:

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)           
myfile = f.readline()  
print myfile
于 2013-02-28T14:58:18.823 回答
-1

我使用了以下代码:

import urllib

def read_text():
      quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
      contents_file = quotes.read()
      print contents_file

read_text()
于 2017-08-22T11:00:39.113 回答
-1
# retrieving data from url
# only for python 3

import urllib.request

def main():
  url = "http://docs.python.org"

# retrieving data from URL
  webUrl = urllib.request.urlopen(url)
  print("Result code: " + str(webUrl.getcode()))

# print data from URL 
  print("Returned data: -----------------")
  data = webUrl.read().decode("utf-8")
  print(data)

if __name__ == "__main__":
  main()
于 2019-11-27T07:37:44.617 回答
-1
from urllib.request import urlopen

# if has Chinese, apply decode()
html = urlopen("https://blog.csdn.net/qq_39591494/article/details/83934260").read().decode('utf-8')
print(html)
于 2020-05-16T07:59:27.640 回答