我有 100 个 CSV 格式的网站列表。所有站点都具有相同的一般格式,包括一个有 7 列的大表。我编写了这个脚本来从每个网站的第 7 列中提取数据,然后将这些数据写入文件。然而,下面的脚本部分工作:打开输出文件(在运行脚本之后)显示正在跳过某些内容,因为它只显示 98 次写入(显然脚本还记录了许多异常)。非常感谢有关如何在这种情况下实现“捕获异常”的指导。谢谢!
import csv, urllib2, re
def replace(variab): return variab.replace(",", " ")
urls = csv.reader(open('input100.txt', 'rb')) #access list of 100 URLs
for url in urls:
html = urllib2.urlopen(url[0]).read() #get HTML starting with the first URL
col7 = re.findall('td7.*?td', html) #use regex to get data from column 7
string = str(col7) #stringify data
neat = re.findall('div3.*?div', string) #use regex to get target text
result = map(replace, neat) #apply function to remove','s from elements
string2 = ", ".join(result) #separate list elements with ', ' for export to csv
output = open('output.csv', 'ab') #open file for writing
output.write(string2 + '\n') #append output to file and create new line
output.close()
返回:
Traceback (most recent call last):
File "C:\Python26\supertest3.py", line 6, in <module>
html = urllib2.urlopen(url[0]).read()
File "C:\Python26\lib\urllib2.py", line 124, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python26\lib\urllib2.py", line 383, in open
response = self._open(req, data)
File "C:\Python26\lib\urllib2.py", line 401, in _open
'_open', req)
File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
result = func(*args)
File "C:\Python26\lib\urllib2.py", line 1130, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python26\lib\urllib2.py", line 1103, in do_open
r = h.getresponse()
File "C:\Python26\lib\httplib.py", line 950, in getresponse
response.begin()
File "C:\Python26\lib\httplib.py", line 390, in begin
version, status, reason = self._read_status()
File "C:\Python26\lib\httplib.py", line 354, in _read_status
raise BadStatusLine(line)
BadStatusLine
>>>>