0

我对 Python 和 Scrapy 非常陌生,并试图将爬取的数据输出到我的 MySQL 数据库,但我遇到了以下错误;

exceptions.AttributeError: 'list' object has no attribute 'encode'

这是我的管道代码;

import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class MySQLStorePipeline(object):
    def __init__(self):
        self.conn = MySQLdb.connect(user='User', passwd='passwd', db='db', host='host', charset="utf8", use_unicode=True)
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):    
        try:
            self.cursor.execute("""INSERT INTO Teams (Country, CountryFlagLink, TeamWikiURL, MethodOfQualification, DateOfQualification, FinalsAppearance, LastAppearance, PreviousBestPerformance, FifaRankingAsOfOct2013)  
                        VALUES (%s, %s)""", 
                       (item['Country'].encode('utf-8'),
                        item['CountryFlagLink'].encode('utf-8'),
                        item['TeamWikiURL'].encode('utf-8'),
                        item['MethodOfQualification'].encode('utf-8'),
                        item['DateOfQualification'].encode('utf-8'),
                        item['FinalsAppearance'].encode('utf-8'),
                        item['LastAppearance'].encode('utf-8'),
                        item['PreviousBestPerformance'].encode('utf-8'),
                        item['FifaRankingAsOfOct2013'].encode('utf-8')))

            self.conn.commit()


        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])

        return item

这是我抓取网站并尝试将数据导入我的 MySQL 数据库后的完整堆栈跟踪;

ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] ERROR: Error processing {'Country': [u'Ecuad
r'],
         'CountryFlagLink': [u'//upload.wikimedia.org/wikipedia/commons/thumb/e
e8/Flag_of_Ecuador.svg/23px-Flag_of_Ecuador.svg.png'],
         'DateOfQualification': [u'15 October 2013'],
         'FifaRankingAsOfOct2013': [u'22'],
         'FinalsAppearance': [u'3rd'],
         'LastAppearance': [u'2006'],
         'MethodOfQualification': [u'CONMEBOL Round Robin 4th place'],
         'PreviousBestPerformance': [u'Round of 16 (2006)'],
         'TeamWikiURL': [u'/wiki/Ecuador_national_football_team']}
        Traceback (most recent call last):
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\mi
dleware.py", line 62, in _process_chain
            return process_chain(self.methods[methodname], obj, *args)
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\ut
ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] ERROR: Error processing {'Country': [u'Hondu
as'],
         'CountryFlagLink': [u'//upload.wikimedia.org/wikipedia/commons/thumb/8
82/Flag_of_Honduras.svg/23px-Flag_of_Honduras.svg.png'],
         'DateOfQualification': [u'15 October 2013'],
         'FifaRankingAsOfOct2013': [u'34'],
         'FinalsAppearance': [u'3rd'],
         'LastAppearance': [u'2010'],
         'MethodOfQualification': [u'CONCACAF Fourth Round 3rd place'],
         'PreviousBestPerformance': [u'Group stage (1982, 2010)'],
         'TeamWikiURL': [u'/wiki/Honduras_national_football_team']}
        Traceback (most recent call last):
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\mi
dleware.py", line 62, in _process_chain
            return process_chain(self.methods[methodname], obj, *args)
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\ut
ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] INFO: Closing spider (finished)
2013-11-12 19:36:33-0600 [wikitut] INFO: Dumping Scrapy stats:
        {'downloader/request_bytes': 246,
         'downloader/request_count': 1,
         'downloader/request_method_count/GET': 1,
         'downloader/response_bytes': 72797,
         'downloader/response_count': 1,
         'downloader/response_status_count/200': 1,
         'finish_reason': 'finished',
         'finish_time': datetime.datetime(2013, 11, 13, 1, 36, 33, 840000),
         'log_count/DEBUG': 7,
         'log_count/ERROR': 22,
         'log_count/INFO': 3,
         'response_received_count': 1,
         'scheduler/dequeued': 1,
         'scheduler/dequeued/memory': 1,

我有一个包含所有必需字段(所有 varchar)并设置为排序规则的 MySQL 数据库设置:utf8_general_ci。我不知道为什么会出现上述错误。有人可以向我解释我做错了什么吗?

4

1 回答 1

2

根据您的错误消息,它似乎item['Country']是列表,其中包含 1 个元素。看Country': [u'Honduas']

所以你需要像这样编辑:

(item['Country'][0].encode('utf-8'),
item['CountryFlagLink'][0].encode('utf-8'),
item['TeamWikiURL'][0].encode('utf-8'),
item['MethodOfQualification'][0].encode('utf-8'),
item['DateOfQualification'][0].encode('utf-8'),
item['FinalsAppearance'][0].encode('utf-8'),
item['LastAppearance'][0].encode('utf-8'),
item['PreviousBestPerformance'][0].encode('utf-8'),
item['FifaRankingAsOfOct2013'][0].encode('utf-8')))

我不是 Python 用户,所以也许我错了。

于 2013-11-13T03:53:07.137 回答