python - python http状态码

Question

我正在用 python 编写自己的目录破坏器，并在安全可靠的环境中针对我的 Web 服务器对其进行测试。该脚本主要尝试从给定网站检索公共目录，并查看响应的 HTTP 状态代码，它能够确定页面是否可访问。
首先，该脚本读取一个包含所有要查找的感兴趣目录的文件，然后通过以下方式发出请求：

for dir in fileinput.input('utils/Directories_Common.wordlist'):

    try:
        conn = httplib.HTTPConnection(url)
        conn.request("GET", "/"+str(dir))
        toturl = 'http://'+url+'/'+str(dir)[:-1]
        print '    Trying to get: '+toturl
        r1 = conn.getresponse()
        response = r1.read()
        print '   ',r1.status, r1.reason
        conn.close()

然后，解析响应，如果返回等于“200”的状态代码，则该页面是可访问的。我已经通过以下方式实现了所有这些：

if(r1.status == 200):
    print '\n[!] Got it! The subdirectory '+str(dir)+' could be interesting..\n\n\n'

对我来说一切都很好，只是脚本标记为实际上不是的可访问页面。事实上，该算法只收集返回“200 OK”的页面，但是当我手动浏览这些页面时，我发现它们已被永久移动或访问受限。出了点问题，但我不知道应该在哪里准确修复代码，感谢任何帮助..

score 2 · Accepted Answer

我没有发现你的代码有任何问题，除了它几乎不可读。我已将其重写为这个工作片段：

import httplib

host = 'www.google.com'
directories = ['aosicdjqwe0cd9qwe0d9q2we', 'reader', 'news']

for directory in directories:
    conn = httplib.HTTPConnection(host)
    conn.request('HEAD', '/' + directory)

    url = 'http://{0}/{1}'.format(host, directory)
    print '    Trying: {0}'.format(url)

    response = conn.getresponse()
    print '    Got: ', response.status, response.reason

    conn.close()

    if response.status == 200:
        print ("[!] The subdirectory '{0}' "
               "could be interesting.").format(directory)

输出：

$ python snippet.py
    Trying: http://www.google.com/aosicdjqwe0cd9qwe0d9q2we
    Got:  404 Not Found
    Trying: http://www.google.com/reader
    Got:  302 Moved Temporarily
    Trying: http://www.google.com/news
    Got:  200 OK
[!] The subdirectory 'news' could be interesting.

此外，我确实使用HEAD HTTP 请求而不是 GET，因为如果您不需要内容并且只对状态代码感兴趣，它会更有效。

score 1 · Accepted Answer

1

我会建议你使用http://docs.python-requests.org/en/latest/#作为 http。

于 2013-04-12T10:53:39.480 回答

python - python http状态码

2 回答 2

Related

Reference