our scraper currently not only downloads text but also images. The scraper in its current state is working fine, we have however big problems with the quality of the downloaded images. After checking the standard ImagePipeline, we implemented a custom one that tells Pillow to use the highest quality, it looks like this (and is configured in settings.py):
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request
from cStringIO import StringIO
class CustomImagesPipeline(ImagesPipeline):
def convert_image(self, image, size=None):
buf = StringIO()
image.save(buf, 'JPEG', quality=100)
return image, buf
We also tried several other presets taken from this file: https://github.com/python-imaging/Pillow/blob/master/PIL/JpegPresets.py
We did however not see any improvements. Did someone here tackle this problem before or has an idea what's wrong with the code?
Thanks :)