我正在尝试编写一些代码来抓取网页。我正在使用 spynner 获取 html 代码并将其传递给比萨
运行 python 代码时我没有看到任何错误,但是生成的 pdf 都是错误的。
这是我使用的代码 -
import os
import sys
import ho.pisa as pisa
import spynner
import logging
class PisaNullHandler(logging.Handler):
def emit(self, record):
pass
url = 'http://www.google.com'
br = spynner.Browser()
br.load(url)
pathToWrite = './google.html.pdf'
htmlCode = br.html
pdfFile = file(pathToWrite, "wb+")
try:
logging.getLogger("ho.pisa").addHandler(PisaNullHandler())
pdfStatus = pisa.CreatePDF(htmlCode.encode("utf-8"), pdfFile, encoding="utf8" )
if not pdfStatus.err:
pdfFile.flush()
else:
print 'Failed with error ' + pdfStatus.error
except Exception as e:
print 'pdf creation failed with error ' + str(e)
我试图从 spynner 保存 html 并通过 xhtml2pdf 运行它。我收到以下错误 -
ERROR [ho.pisa] Document error
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/pisa3/pisa_document.py", line 128, in pisaDocument
c = pisaStory(src, path, link_callback, debug, default_css, xhtml, encoding, c=c, xml_output=xml_output)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/pisa3/pisa_document.py", line 73, in pisaStory
pisaParser(src, c, default_css, xhtml, encoding, xml_output)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/pisa3/pisa_parser.py", line 626, in pisaParser
c.parseCSS()
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/pisa3/pisa_context.py", line 545, in parseCSS
self.css = self.cssParser.parse(self.cssText)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/w3c/cssParser.py", line 358, in parse
src, stylesheet = self._parseStylesheet(src)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/w3c/cssParser.py", line 458, in _parseStylesheet
src, ruleset = self._parseRuleset(src)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/w3c/cssParser.py", line 737, in _parseRuleset
src, properties = self._parseDeclarationGroup(src.lstrip())
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/w3c/cssParser.py", line 905, in _parseDeclarationGroup
src, property = self._parseDeclaration(src)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/w3c/cssParser.py", line 945, in _parseDeclaration
src, property = self._parseDeclarationProperty(src, propertyName)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/w3c/cssParser.py", line 953, in _parseDeclarationProperty
src, expr = self._parseExpression(src)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/w3c/cssParser.py", line 968, in _parseExpression
src, term = self._parseExpressionTerm(src)
File "/usr/local/lib/python2.7/dist-packages/pisa-3.0.33-py2.7.egg/sx/w3c/cssParser.py", line 1020, in _parseExpressionTerm
raise self.ParseError('Terminal function expression expected closing \')\'', src, ctxsrc)
CSSParseError: Terminal function expression expected closing ')':: (u'alpha(opacity', u'=100);position:absol')
*** ERRORS OCCURED