-4

我一直在使用 Python 和 Urllib2 编写网站列表的 Robots.txt 下载器。以下是代码

    import MySQLdb
    import urllib
    import urllib2
    clone=0
    db = MySQLdb.connect("127.0.0.1","root","","research" )
    cursor = db.cursor()
    sql = "SELECT * FROM sites"
    try:
     cursor.execute(sql)
         # Fetch all the rows in a list of lists.
     results = cursor.fetchall()
     for row in results:
     id = row[0]
     website = row[1]
     website=website+"robots.txt"
     print website
     try:
        check = urllib2.urlopen(website,timeout=10).code
        if not check: 
            print "No WEBSERVER FOUND"
            clone=1
     except IOError:
        clone=1
        print "No Webserver Found"
     if(check==200 or clone==0):
        sql2 = "UPDATE sites SET robots_txt_available=1 WHERE ID=%s" % \
            (id)
                    cursor.execute(sql)
        print website," Has Robots.txt.";
    else:print website," does not Have robots.txt."
    except:
            print "Error: unable to fecth data"

            # disconnect from server
    db.close()

代码的输出是:

 http://rashtrapatisachivalaya.gov.in/robots.txt
 No Webserver Found
 Error: unable to fecth data

所以它没有完全执行。任何人都可以告诉这个代码中的问题是什么。

4

1 回答 1

1

你想说啥?给定的 URL 不存在,因此正在执行 except 子句中的代码。并且只有在没有异常的情况下才会执行'code'属性访问......

正确的解决方案是

import urllib2
try:
   urllib2.urlopen("some url")
except urllib2.HTTPError, err:
   if err.code == 404:
       <whatever>
   else:
       raise
于 2012-12-16T10:59:59.990 回答