我是 Python 新手,我正在编写一个<td>
在 HTML 表中查找行的网络爬虫:
# open CSV with URLS to scrape
csv_file = csv.reader(open('urls.csv', 'rb'), delimiter=',')
names = []
for data in csv_file:
names.append(data[0])
for name in names:
html = D.get(name);
html2 = html
param = '<br />';
html2 = html2.replace("<br />", " | ")
print name
c = csv.writer(open("darkgrey.csv", "a"))
for row in xpath.search(html2, '//table/tr[@class="bgdarkgrey"]'):
cols = xpath.search(row, '/td')
c.writerow([cols[0], cols[1], cols[2], cols[3], cols[4]])
它所做的只是从 4 个表中获取值'<td>'
问题是,有些表没有cols[2]
,cols[3]
或者cols[4]
有没有办法,我可以检查这些是否存在?
谢谢