77

通过使用python,我如何检查网站是否启动?根据我阅读的内容,我需要检查“HTTP HEAD”并查看状态代码“200 OK”,但该怎么做?

干杯

有关的

4

16 回答 16

118

您可以尝试使用getcode()from urllib执行此操作

import urllib.request

print(urllib.request.urlopen("https://www.stackoverflow.com").getcode())
200

对于 Python 2,使用

print urllib.urlopen("http://www.stackoverflow.com").getcode()
200
于 2009-12-22T21:38:11.877 回答
31

我认为最简单的方法是使用Requests模块。

import requests

def url_ok(url):
    r = requests.head(url)
    return r.status_code == 200
于 2013-04-01T12:36:55.227 回答
11

您可以使用httplib

import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD", "/")
r1 = conn.getresponse()
print r1.status, r1.reason

印刷

200 OK

当然,只有当www.python.org是起来。

于 2009-12-22T21:44:21.073 回答
8
import httplib
import socket
import re

def is_website_online(host):
    """ This function checks to see if a host name has a DNS entry by checking
        for socket info. If the website gets something in return, 
        we know it's available to DNS.
    """
    try:
        socket.gethostbyname(host)
    except socket.gaierror:
        return False
    else:
        return True


def is_page_available(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        False.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        if re.match("^[23]\d\d$", str(conn.getresponse().status)):
            return True
    except StandardError:
        return None
于 2009-12-22T22:06:52.300 回答
6
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
req = Request("http://stackoverflow.com")
try:
    response = urlopen(req)
except HTTPError as e:
    print('The server couldn\'t fulfill the request.')
    print('Error code: ', e.code)
except URLError as e:
    print('We failed to reach a server.')
    print('Reason: ', e.reason)
else:
    print ('Website is working fine')

适用于 Python 3

于 2016-07-01T12:36:53.017 回答
4

标准库中模块中的HTTPConnection对象httplib可能会为您解决问题。顺便说一句,如果您开始在 Python 中使用 HTTP 进行任何高级操作,请务必查看httplib2;这是一个很棒的图书馆。

于 2009-12-22T21:34:44.947 回答
3

如果服务器关闭,则在 python 2.7 x86 windows urllib 上没有超时并且程序进入死锁。所以使用 urllib2

import urllib2
import socket

def check_url( url, timeout=5 ):
    try:
        return urllib2.urlopen(url,timeout=timeout).getcode() == 200
    except urllib2.URLError as e:
        return False
    except socket.timeout as e:
        print False


print check_url("http://google.fr")  #True 
print check_url("http://notexist.kc") #False     
于 2017-10-06T09:41:03.600 回答
3

您可以使用requests库来查找网站是否已启动,status code200

import requests
url = "https://www.google.com"
page = requests.get(url)
print (page.status_code) 

>> 200
于 2018-08-12T03:16:01.880 回答
2

在我看来,caisah 的回答错过了您问题的一个重要部分,即处理服务器离线。

尽管如此,使用requests是我最喜欢的选择,尽管如此:

import requests

try:
    requests.get(url)
except requests.exceptions.ConnectionError:
    print(f"URL {url} not reachable")
于 2019-09-18T18:55:00.383 回答
1

如果 up,你的意思是“服务器正在服务”,那么你可以使用 cURL,如果你得到一个响应而不是它。

因为我不是 python 程序员,所以我不能给你具体的建议,但是这里有一个指向 pycurl http://pycurl.sourceforge.net/的链接。

于 2009-12-22T21:34:12.480 回答
1

嗨,这个类可以用这个类对你的网页进行加速和测试:

 from urllib.request import urlopen
 from socket import socket
 import time


 def tcp_test(server_info):
     cpos = server_info.find(':')
     try:
         sock = socket()
         sock.connect((server_info[:cpos], int(server_info[cpos+1:])))
         sock.close
         return True
     except Exception as e:
         return False


 def http_test(server_info):
     try:
         # TODO : we can use this data after to find sub urls up or down    results
         startTime = time.time()
         data = urlopen(server_info).read()
         endTime = time.time()
         speed = endTime - startTime
         return {'status' : 'up', 'speed' : str(speed)}
     except Exception as e:
         return {'status' : 'down', 'speed' : str(-1)}


 def server_test(test_type, server_info):
     if test_type.lower() == 'tcp':
         return tcp_test(server_info)
     elif test_type.lower() == 'http':
         return http_test(server_info)
于 2017-08-21T06:59:16.363 回答
1

Requestshttplib2是很好的选择:

# Using requests.
import requests
request = requests.get(value)
if request.status_code == 200:
    return True
return False

# Using httplib2.
import httplib2

try:
    http = httplib2.Http()
    response = http.request(value, 'HEAD')

    if int(response[0]['status']) == 200:
        return True
except:
    pass
return False

如果使用Ansible,您可以使用 fetch_url 函数:

from ansible.module_utils.basic import AnsibleModule
from ansible.module_utils.urls import fetch_url

module = AnsibleModule(
    dict(),
    supports_check_mode=True)

try:
    response, info = fetch_url(module, url)
    if info['status'] == 200:
        return True

except Exception:
    pass

return False
于 2019-07-25T21:58:17.027 回答
1

我的 2 美分

def getResponseCode(url):
conn = urllib.request.urlopen(url)
return conn.getcode()

if getResponseCode(url) != 200:
    print('Wrong URL')
else:
    print('Good URL')
于 2019-11-19T21:18:47.743 回答
1

我为此使用请求,然后它很容易和干净。您可以定义和调用新功能(通过电子邮件等通知)来代替打印功能。Try-except块是必不可少的,因为如果主机无法访问,那么它会引发很多异常,因此您需要将它们全部捕获。

import requests

URL = "https://api.github.com"

try:
    response = requests.head(URL)
except Exception as e:
    print(f"NOT OK: {str(e)}")
else:
    if response.status_code == 200:
        print("OK")
    else:
        print(f"NOT OK: HTTP response code {response.status_code}")
于 2021-01-08T12:22:16.223 回答
0

这是我使用PycURL验证器的解决方案

import pycurl, validators


def url_exists(url):
    """
    Check if the given URL really exists
    :param url: str
    :return: bool
    """
    if validators.url(url):
        c = pycurl.Curl()
        c.setopt(pycurl.NOBODY, True)
        c.setopt(pycurl.FOLLOWLOCATION, False)
        c.setopt(pycurl.CONNECTTIMEOUT, 10)
        c.setopt(pycurl.TIMEOUT, 10)
        c.setopt(pycurl.COOKIEFILE, '')
        c.setopt(pycurl.URL, url)
        try:
            c.perform()
            response_code = c.getinfo(pycurl.RESPONSE_CODE)
            c.close()
            return True if response_code < 400 else False
        except pycurl.error as err:
            errno, errstr = err
            raise OSError('An error occurred: {}'.format(errstr))
    else:
        raise ValueError('"{}" is not a valid url'.format(url))
于 2016-12-06T12:33:50.677 回答
0

也可以通过这种方式查看网站状态,

Import requests
def monitor():
    r = requests.get("https://www.google.com/", timeout=5)
    print(r.status_code)
于 2021-09-27T07:56:08.890 回答