我一直在使用 Python 和 Urllib2 编写网站列表的 Robots.txt 下载器。以下是代码
import MySQLdb
import urllib
import urllib2
clone=0
db = MySQLdb.connect("127.0.0.1","root","","research" )
cursor = db.cursor()
sql = "SELECT * FROM sites"
try:
cursor.execute(sql)
# Fetch all the rows in a list of lists.
results = cursor.fetchall()
for row in results:
id = row[0]
website = row[1]
website=website+"robots.txt"
print website
try:
check = urllib2.urlopen(website,timeout=10).code
if not check:
print "No WEBSERVER FOUND"
clone=1
except IOError:
clone=1
print "No Webserver Found"
if(check==200 or clone==0):
sql2 = "UPDATE sites SET robots_txt_available=1 WHERE ID=%s" % \
(id)
cursor.execute(sql)
print website," Has Robots.txt.";
else:print website," does not Have robots.txt."
except:
print "Error: unable to fecth data"
# disconnect from server
db.close()
代码的输出是:
http://rashtrapatisachivalaya.gov.in/robots.txt
No Webserver Found
Error: unable to fecth data
所以它没有完全执行。任何人都可以告诉这个代码中的问题是什么。