我正在尝试加载第二个管道以将条目写入 mysql 数据库。在 de log 我看到他正在加载,但之后没有任何反应。甚至没有记录。这是我的管道:
# Mysql
import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request
class MySQLStorePipeline(object):
def __init__(self):
self.conn = MySQLdb.connect(host="localhost", user="***", passwd="***", db="***", charset="utf8", use_unicode=True)
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
CurrentDateTime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
Md5Hash = hashlib.md5(item['link']).hexdigest()
try:
self.cursor.execute("""INSERT INTO apple (article_add_date, article_date, article_title, article_link, article_link_md5, article_summary, article_image_url, article_source) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)""", (CurrentDateTime, item['date'], item['title'], item['link'], Md5Hash, item['summary'], item['image'], item['sourcesite']))
self.conn.commit()
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
return item
这是我的日志:
scrapy crawl macnn_com
2013-06-20 08:15:53+0200 [scrapy] INFO: Scrapy 0.16.4 started (bot: HungryFeed)
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled item pipelines: MySQLStorePipeline, CleanDateField
2013-06-20 08:15:54+0200 [macnn_com] INFO: Spider opened
2013-06-20 08:15:54+0200 [macnn_com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Crawled (200) <GET http://www.macnn.com> (referer: None)
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Crawled (200) <GET http://www.macnn.com/articles/13/06/19/compatibility.described.as.experimental/> (referer: http://www.macnn.com)
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Scraped from <200 http://www.macnn.com/articles/13/06/19/compatibility.described.as.experimental/>
*** lot of scraping data ***
*** lot of scraping data ***
*** lot of scraping data ***
2013-06-20 08:15:56+0200 [macnn_com] INFO: Closing spider (finished)
2013-06-20 08:15:56+0200 [macnn_com] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 5711,
'downloader/request_count': 17,
'downloader/request_method_count/GET': 17,
'downloader/response_bytes': 281140,
'downloader/response_count': 17,
'downloader/response_status_count/200': 17,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2013, 6, 20, 6, 15, 56, 685286),
'item_scraped_count': 16,
'log_count/DEBUG': 39,
'log_count/INFO': 4,
'request_depth_max': 1,
'response_received_count': 17,
'scheduler/dequeued': 17,
'scheduler/dequeued/memory': 17,
'scheduler/enqueued': 17,
'scheduler/enqueued/memory': 17,
'start_time': datetime.datetime(2013, 6, 20, 6, 15, 54, 755766)}
2013-06-20 08:15:56+0200 [macnn_com] INFO: Spider closed (finished)
当然,我不必提及我在 settings.py 中加载了管道,例如:
ITEM_PIPELINES = [
'HungryFeed.pipelines.CleanDateField',
'HungryFeed.pipelines.MySQLStorePipeline'
]
我在这里想念什么吗?
这是我的第一个管道:
class CleanDateField(object):
def process_item(self,item,spider):
from dateutil import parser
rawdate = item['date']
#text replace per spider so parser can recognize better the datetime
if spider.name == "macnn_com":
rawdate = rawdate.replace("updated","").strip()
dt = parser.parse(rawdate)
articledate = dt.strftime("%Y-%m-%d %H:%M:%S")
item['date'] = articledate
return item