I have a Scrapy project that uses custom middleware and a custom pipeline to check and store entries in a Postgres DB. The middleware looks a bit like this:
class ExistingLinkCheckMiddleware(object): def __init__(self): ... open connection to database def process_request(self, request, spider): ... before each request check in the DB that the page hasn't been scraped before
The pipeline looks similar:
class MachinelearningPipeline(object): def __init__(self): ... open connection to database def process_item(self, item, spider): ... save the item to the database
It works fine, but I can't find a way to cleanly close these database connections when the spider finishes, which irks me.
Does anyone know how to do that?