我正在尝试搜索新闻组以测试一些基于文本的分组算法,获取成批的新闻组标题并将它们粘贴到 SQLite 数据库中。数据库尽可能简单,所有数据都有文本列,Python 的 nntp 库获取的标头数据始终为每个标头提供 8 个值。除了其中一个之外,所有这些都是字符串,我在将数据插入数据库之前将唯一的非字符串转换为字符串。尽管如此,Python 还是出现了相当无用的“TypeError:并非所有参数在字符串格式化期间转换”错误,这只是说“错误:祝你好运,你靠你自己”的一个边际步骤。
了解字符串到字符串的字符串格式如何出错的人是否比我知道以下代码中出了什么问题更好?
import nntplib, sqlite3
# newsgroup settings (modify so that this works for you =)
server = 'news.yournewsgroup.com'
port = 119
username = 'your name here'
password = 'your password here'
# set up the newsgroup and sqlite connections
connection = nntplib.NNTP(server, port, username, password)
newsgroup = "comp.graphics.algorithms"
connection.group(newsgroup)
database = sqlite3.connect(newsgroup + ".db")
# create a table definition if it doesn't exist yet
try:
# SQLite doesn't actually have data types. Everything as stored as plain text.
# And so is newsgroup data. Bonus!
database.execute("""CREATE TABLE headers (articleNumber text, subject text,
poster text, date text, id text,
references text, size text,
lines text)""")
except:
# table definition already exists. Not actually an error.
pass
# Get the group meta-data, and set up iterator values for running
# through the header list.
resp, count, first, last, name = connection.group(newsgroup)
total = int(last) - int(first)
step = 10000
steps = total / step;
articleRange = first + '-' + str(int(first)+step)
# grab a batch of headers
print "[FETCHING HEADERS]"
resp, list = connection.xover(first, str(int(first)+step))
print "done."
# process the fetched headers
print "[PROCSSING HEADERS]"
for entry in list:
# Unpack immutable tuple, mutate (because the references list
# should be a string), then repack.
articleNumber, subject, poster, date, id, references, size, lines = entry
argumentList = (articleNumber, subject, poster, date, id, (",".join(references)), size, lines)
try:
# try to chronicle the header information. THIS WILL GO WRONG AT SOME POINT.
database.execute("""INSERT INTO headers (articleNumber, subject, poster,
date, id, reference, size, lines)
VALUES ('?', '?', '?',
'?', '?','?', '?', '?')"""
% argumentList)
except TypeError as err:
# And here is an irking point with Python in general. Something went
# wrong, yet all it tells us is "not all arguments converted during
# string formatting". Despite that error being generated at a point
# where the code knows WHICH argument was the problem.
print err
print type(argumentList[0]), argumentList[0]
print type(argumentList[1]), argumentList[1]
print type(argumentList[2]), argumentList[2]
print type(argumentList[3]), argumentList[3]
print type(argumentList[4]), argumentList[4]
print type(argumentList[5]), argumentList[5]
print type(argumentList[6]), argumentList[6]
print type(argumentList[7]), argumentList[7]
# A quick print set shows us that all arguments are already of type
# "str", and none of them are empty... so it would take quite a bit
# of work to make them fail at being legal strings... Wat?
exit(1)
print "done."
# cleanup
database.close()
connection.quit()