以下是我的代码:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sqlite3
import lxml.cssselect
import lxml.html
import xml.etree.ElementTree as etree
import urllib
db = sqlite3.connect('abiturient.sqlite')
sql = db.cursor()
query = "DELETE FROM universities"
sql.execute(query)
regions = sql.execute('SELECT * FROM regions')
for region in regions:
doc = lxml.html.document_fromstring(urllib.urlopen(region[2]).read())
for topic in doc.xpath('//span[@id="branch2"]/a'):
name = topic.text_content().replace("'", "''")
link = 'http://vstup.info/2013' + topic.attrib['href'][1:-5] + 'b.html'
region_id = str(region[0])
sql.execute("INSERT INTO universities (id, name, link, region_id) VALUES (NULL, '" + name + "', '" + link + "', '" + region_id + "')")
print region[1] + ': added.'
db.commit()
db.close()
我的表regions
包含三个条目(三个用于解析的链接)。我从 SQLite 中选择它们并使用lxml
. 但是有一个问题:循环for region in regions:
只执行一次(仅解析第一个链接并停止而没有错误)。我不知道发生这种情况的原因。也许它是由于循环中有一个循环而发生的?