I'm trying to convert this python command line utility so that I can use the code as a module in my existing program and so that I can feed it a pdf which is stored in MongoDB using MongoEngine.
Currently it takes a filename as a string and gets the file using the following code:
fp = file(fname, 'rb')
Since I want to put in a document from my mongoDB, I changed the argument of the function to main(fp)
and did the following from the interactive python interpreter:
>>> from app.documents import UserDocument
>>> from app.pdfutils2 import main
>>> doc = UserDocument.objects.first()
>>> main(doc._file.read())
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "app/pdfutils2.py", line 107, in main
for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password, caching=caching, check_extractable=True):
File "/Library/Python/2.7/site-packages/pdfminer/pdfpage.py", line 118, in get_pages
parser = PDFParser(fp)
File "/Library/Python/2.7/site-packages/pdfminer/pdfparser.py", line 43, in __init__
PSStackParser.__init__(self, fp)
File "/Library/Python/2.7/site-packages/pdfminer/psparser.py", line 495, in __init__
PSBaseParser.__init__(self, fp)
File "/Library/Python/2.7/site-packages/pdfminer/psparser.py", line 166, in __init__
self.seek(0)
File "/Library/Python/2.7/site-packages/pdfminer/psparser.py", line 507, in seek
PSBaseParser.seek(self, pos)
File "/Library/Python/2.7/site-packages/pdfminer/psparser.py", line 196, in seek
self.fp.seek(pos)
AttributeError: 'str' object has no attribute 'seek'
Since fp
is initially created using the 'rb'
flag I suppose I need to create fp
in binary mode from mongoengine, but I wouldn't know how to convert the GridFS-results from my FileField into binary mode.
Does anybody have a tip on how I could convert GridFS results into binary so that it is the same as when I retreive it using file(fname, 'rb')
? All tips are welcome!