12

我正在尝试做一个脚本来检查是否存在许多网址:

import httplib

with open('urls.txt') as urls:
    for url in urls:
        connection = httplib.HTTPConnection(url)
        connection.request("GET")
        response = connection.getresponse()
        if response.status == 200:
            print '[{}]: '.format(url), "Up!"

但我得到了这个错误:

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    connection = httplib.HTTPConnection(url)
  File "/usr/lib/python2.7/httplib.py", line 693, in __init__
    self._set_hostport(host, port)
  File "/usr/lib/python2.7/httplib.py", line 721, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: '//globo.com/galeria/amazonas/a.html

怎么了?

4

3 回答 3

27

这可能是一个简单的解决方案,在这里

connection = httplib.HTTPConnection(url)

您正在使用,httpconnection因此无需提供诸如http://OSMQuote.com之类的网址,但您需要提供OSMQuote.com

简而言之,从您的 URL 中删除http://and https://,因为它httplib被视为:端口号,并且端口号必须是数字,

希望这可以帮助!

于 2013-08-30T05:51:20.187 回答
9

httplib.HttpConnection在其构造函数中获取远程 URL的hostand port,而不是整个 URL。

对于您的用例,它更易于使用urllib2.urlopen

import urllib2

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 
于 2013-01-24T00:27:18.233 回答
1

非数字端口:

解决方案 :

http.client.HTTPSConnection("api.cognitive.microsofttranslator.com")

从服务 URL 或端点中删除“ https:// ”,它将起作用。

https://appdotpy.wordpress.com/2020/07/04/errorsolved-nonnumeric-port/

于 2020-07-04T15:08:36.410 回答