编辑:(已解决)当我从我的文件中读取值时,一个换行符被添加到末尾。(\n)这是在那个时候分割我的请求字符串。我认为这与我最初将值保存到文件中的方式有关。非常感谢。
我有以下代码:
results = 'http://www.myurl.com/'+str(mystring)
print str(results)
request = urllib2.Request(results)
request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
opener = urllib2.build_opener()
text = opener.open(request).read()
这是一个循环。在循环运行几次 str(mystring) 更改后,会给出一组不同的结果。我可以尽可能多地循环脚本,因为我喜欢保持 str(mystring) 的值不变,但是每次我更改 str(mystring) 的值时,我都会收到一个错误,提示当代码尝试构建开启程序时没有给出主机。
opener = urllib2.build_opener()
有人可以帮忙吗?
TIA,
保罗。
编辑:
更多代码在这里......
import sys
import string
import httplib
import urllib2
import re
import random
import time
def StripTags(text):
finished = 0
while not finished:
finished = 1
start = text.find("<")
if start >= 0:
stop = text[start:].find(">")
if stop >= 0:
text = text[:start] + text[start+stop+1:]
finished = 0
return text
mystring="test"
d={}
with open("myfile","r") as f:
while True:
page_counter=0
print str(mystring)
try:
while page_counter <20:
results = 'http://www.myurl.com/'+str(mystring)
print str(results)
request = urllib2.Request(results)
request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)')
opener = urllib2.build_opener()
text = opener.open(request).read()
finds = (re.findall('([\w\.\-]+'+mystring+')',StripTags(text)))
for find in finds:
d[find]=1
uniq_emails=d.keys()
page_counter = page_counter +1
print "found this " +str(finds)"
random.seed()
n = random.random()
i = n * 5
print "Pausing script for " + str(i) + " Seconds" + ""
time.sleep(i)
mystring=next(f)
except IOError:
print "No result found!"+""