2

我正在尝试搜索新闻组以测试一些基于文本的分组算法,获取成批的新闻组标题并将它们粘贴到 SQLite 数据库中。数据库尽可能简单,所有数据都有文本列,Python 的 nntp 库获取的标头数据始终为每个标头提供 8 个值。除了其中一个之外,所有这些都是字符串,我在将数据插入数据库之前将唯一的非字符串转换为字符串。尽管如此,Python 还是出现了相当无用的“TypeError:并非所有参数在字符串格式化期间转换”错误,这只是说“错误:祝你好运,你靠你自己”的一个边际步骤。

了解字符串到字符串的字符串格式如何出错的人是否比我知道以下代码中出了什么问题更好?

import nntplib, sqlite3

# newsgroup settings (modify so that this works for you =)
server = 'news.yournewsgroup.com'
port = 119
username = 'your name here'
password = 'your password here'

# set up the newsgroup and sqlite connections
connection = nntplib.NNTP(server, port, username, password)
newsgroup = "comp.graphics.algorithms"
connection.group(newsgroup)
database = sqlite3.connect(newsgroup + ".db")

# create a table definition if it doesn't exist yet
try:
  # SQLite doesn't actually have data types. Everything as stored as plain text.
  # And so is newsgroup data. Bonus!
  database.execute("""CREATE TABLE headers (articleNumber text, subject text,
                                            poster text, date text, id text,
                                            references text, size text,
                                            lines text)""")
except:
  # table definition already exists. Not actually an error.
  pass

# Get the group meta-data, and set up iterator values for running
# through the header list.
resp, count, first, last, name = connection.group(newsgroup)
total = int(last) - int(first)
step = 10000
steps = total / step;
articleRange = first + '-' + str(int(first)+step)

# grab a batch of headers
print "[FETCHING HEADERS]"
resp, list = connection.xover(first, str(int(first)+step))
print "done."

# process the fetched headers
print "[PROCSSING HEADERS]"
for entry in list:
  # Unpack immutable tuple, mutate (because the references list
  # should be a string), then repack.
  articleNumber, subject, poster, date, id, references, size, lines = entry
  argumentList = (articleNumber, subject, poster, date, id, (",".join(references)), size, lines)

  try:
    # try to chronicle the header information. THIS WILL GO WRONG AT SOME POINT.
    database.execute("""INSERT INTO headers (articleNumber, subject, poster,
                                             date, id, reference, size, lines)
                                    VALUES ('?', '?', '?',
                                            '?', '?','?', '?', '?')"""
                                    % argumentList)

  except TypeError as err:
    # And here is an irking point with Python in general. Something went
    # wrong, yet all it tells us is "not all arguments converted during
    # string formatting". Despite that error being generated at a point
    # where the code knows WHICH argument was the problem.
    print err
    print type(argumentList[0]), argumentList[0]
    print type(argumentList[1]), argumentList[1]
    print type(argumentList[2]), argumentList[2]
    print type(argumentList[3]), argumentList[3]
    print type(argumentList[4]), argumentList[4]
    print type(argumentList[5]), argumentList[5]
    print type(argumentList[6]), argumentList[6]
    print type(argumentList[7]), argumentList[7]
    # A quick print set shows us that all arguments are already of type
    # "str", and none of them are empty... so it would take quite a bit
    # of work to make them fail at being legal strings... Wat?
    exit(1)
print "done."

# cleanup
database.close()
connection.quit()
4

2 回答 2

3

该错误告诉您的是,您为字符串格式 ( %) 提供了 n 个值,但格式字符串预期小于 n 个值。具体来说,这个字符串:

"""INSERT INTO headers (articleNumber, subject, poster,
                        date, id, reference, size, lines)
          VALUES ('?', '?', '?',
                  '?', '?','?', '?', '?')"""

不期望-style 字符串格式的任何值。%里面没有%d,没有%s,什么都没有。相反,?占位符用于 DB API 的参数替换。您不需要使用%操作员调用它(这里根本不需要它)。而是将值序列作为第二个参数传递给execute调用。此外,您需要从占位符中删除引号,以表明它们应该是占位符,而不是恰好包含单引号字符的字符串文字。总之:

database.execute("""
    INSERT INTO headers (articleNumber, subject, poster,
                         date, id, reference, size, lines)
    VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", # note: comma, not %
     argumentList)
于 2012-11-24T17:38:16.550 回答
0

您不想那样做 - 它不安全且容易出错。

您需要使用以下模式:

argumentList = [1, 2, 3, 4, 5, 6, 7, 8] # or whatever
insert_stament = """INSERT INTO headers (articleNumber, subject, poster,
                                         date, id, reference, size, lines)
                                VALUES (?, ?, ?,
                                        ?, ?, ?, ?, ?)"""

cursor.execute(insert_statement, argumentList)
于 2012-11-24T17:33:23.903 回答