8

我得到的错误信息是

Traceback (most recent call last):
  File "./test.py", line 416, in <module>
    startup()
  File "./test.py", line 275, in startup
    writer.save(r,data) 
  File "/home/user/project/test/output.py", line 91, in save
    self.save_doc(r, data, pid)
  File "/home/user/project/test/output.py", line 130, in save_doc
    cursor.execute(dbquery)
  File "/usr/local/lib/python2.6/site-packages/django/db/backends/util.py", line 34, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python2.6/site-packages/django/db/backends/mysql/base.py", line 86, in execute
    return self.cursor.execute(query, args)
  File "/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-x86_64.egg/MySQLdb/cursors.py", line 175, in execute
  File "/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-x86_64.egg/MySQLdb/cursors.py", line 89, in _warning_check
_mysql_exceptions.Warning: Data truncated for column 'url' at row 1

我要检查的第一件事是 URL 字符串是否比字段长度长,但实际上要短得多。数据库设计

+----------+-------------------------------------------------------------------+
| Database | Create Database                                                   |
+----------+-------------------------------------------------------------------+
| myurlcol | CREATE DATABASE `myurlcol` /*!40100 DEFAULT CHARACTER SET utf8 */ | 
+----------+-------------------------------------------------------------------+

表设计

  CREATE TABLE `document` (
  `id` int(11) NOT NULL auto_increment,
  `url` varchar(255) collate utf8_bin NOT NULL,
  `md5` varchar(32) collate utf8_bin NOT NULL,
  `host` varchar(255) collate utf8_bin default NULL,
  `content_sha1` varchar(40) collate utf8_bin NOT NULL,
  `add_date` datetime NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `url` (`url`),
  UNIQUE KEY `md5` (`md5`),
  KEY `main_crawl_document_content_sha1` (`content_sha1`),
  KEY `main_crawl_document_discover_date` (`add_date`),
  KEY `main_crawl_document_host` (`host`),
  ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

我打印出我试图插入表中的数据的长度(我只使用一个 URL 进行测试):

len(url) =  89
len(md5) =  32
len(host) =  20
len(content_sha1) =  40
len(add_date) =  19

我使用的是由 Django.db.connection 创建的游标。为了提供更多信息,我粘贴了我传递给的完整数据库查询命令cursor.execute()

INSERT INTO main_document SET url='ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/0a/39/Arthritis_Res_2000_Jun_5_2%284%29_315-326.tar.gz',md5='b6ba3adde8de87e4dc255092b04d07ea',host='ftp.ncbi.nlm.nih.gov',content_sha1='9aeab4412cc9b1add84a6d2bca574664e193b56e',add_date='2012-05-15 00:00:00';

有趣的是,当我复制并粘贴到 MySQL 命令行时,上面的命令有效。没有错误消息,数据只是正确插入。

出了什么问题?

4

2 回答 2

1

cursor.execute如果您正确使用它,它会处理 MySQL 转义。这里有一些例子

基本思想是%s在您当前包含原始值的原始 SQL 字符串中使用,然后将第二个参数传递给cursor.execute()该参数,该参数是按顺序排列的值的元组(或数组)。在您的情况下,这看起来像:

url = 'ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/0a/39/Arthritis_Res_2000_Jun_5_2%284%29_315-326.tar.gz'
md5 = 'b6ba3adde8de87e4dc255092b04d07ea'
host = 'ftp.ncbi.nlm.nih.gov'
content_sha1 = '9aeab4412cc9b1add84a6d2bca574664e193b56e'
add_date = '2012-05-15 00:00:00'
sql = "INSERT INTO main_document SET url = %s, md5 = %s, host = %s, content_sha1 = %s, add_date = %s"
cursor.execute(sql, (url, md5, host, content_sha1, add_date))
于 2014-11-25T07:40:00.267 回答
0

您应该尝试urllib.unquote(url)在将其插入数据库之前取消引用 url 字符串出现在您的字符串中的引号字符标记 % 是MySQL的特殊字符,可能会破坏您的事务。

您的插入内容应为:

INSERT INTO main_document SET url='ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/0a/39/Arthritis_Res_2000_Jun_5_2(4)_315-326.tar.gz',md5='b6ba3adde8de87e4dc255092b04d07ea',host='ftp.ncbi.nlm.nih.gov',content_sha1='9aeab4412cc9b1add84a6d2bca574664e193b56e',add_date='2012-05-15 00:00:00';

请注意,MySQL 仅在模式匹配上下文中将 % 视为特殊的。所以最后这里的 Django ORM 可能有问题。

于 2012-07-24T20:54:05.247 回答