1

I'm trying to convert this python command line utility so that I can use the code as a module in my existing program and so that I can feed it a pdf which is stored in MongoDB using MongoEngine.

Currently it takes a filename as a string and gets the file using the following code:

fp = file(fname, 'rb')

Since I want to put in a document from my mongoDB, I changed the argument of the function to main(fp) and did the following from the interactive python interpreter:

>>> from app.documents import UserDocument
>>> from app.pdfutils2 import main
>>> doc = UserDocument.objects.first()
>>> main(doc._file.read())
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "app/pdfutils2.py", line 107, in main
    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password, caching=caching, check_extractable=True):
  File "/Library/Python/2.7/site-packages/pdfminer/pdfpage.py", line 118, in get_pages
    parser = PDFParser(fp)
  File "/Library/Python/2.7/site-packages/pdfminer/pdfparser.py", line 43, in __init__
    PSStackParser.__init__(self, fp)
  File "/Library/Python/2.7/site-packages/pdfminer/psparser.py", line 495, in __init__
    PSBaseParser.__init__(self, fp)
  File "/Library/Python/2.7/site-packages/pdfminer/psparser.py", line 166, in __init__
    self.seek(0)
  File "/Library/Python/2.7/site-packages/pdfminer/psparser.py", line 507, in seek
    PSBaseParser.seek(self, pos)
  File "/Library/Python/2.7/site-packages/pdfminer/psparser.py", line 196, in seek
    self.fp.seek(pos)
AttributeError: 'str' object has no attribute 'seek'

Since fp is initially created using the 'rb' flag I suppose I need to create fp in binary mode from mongoengine, but I wouldn't know how to convert the GridFS-results from my FileField into binary mode.

Does anybody have a tip on how I could convert GridFS results into binary so that it is the same as when I retreive it using file(fname, 'rb')? All tips are welcome!

4

1 回答 1

0

我找到了答案,所以为了以后的读者;我应该做的

main(doc._file)

代替

main(doc._file.read())

祝你今天过得愉快!

于 2014-10-22T11:36:09.763 回答