python - 使用 ScraperWiki 抓取 PDF 并获得未定义的错误

Question

我正在尝试使用 ScraperWiki 抓取此 PDF。当前代码给我一个错误，名称为“数据”未定义，但我收到错误

elif int(el.attrib['left']) < 647: data['Neighborhood'] = el.text

如果我将该行注释掉，我的 else 语句也会出现同样的错误。

这是我的代码

import scraperwiki
import urllib2, lxml.etree
#Pull Mondays
url = 'http://www.city.pittsburgh.pa.us/police/blotter/blotter_monday.pdf'
pdfdata = urllib2.urlopen(url).read()
xmldata = scraperwiki.pdftoxml(pdfdata)
root = lxml.etree.fromstring(xmldata)
# how many pages in PDF
pages = list(root)
print "There are",len(pages),"pages"
# Test Scrape of only Page 1 of 29
for page in pages[0:1]:
    for el in page:
        if el.tag == "text":
            if int(el.attrib['left']) < 11: data = { 'Report Name': el.text }
            elif int(el.attrib['left']) < 317: data['Location of Occurrence'] = el.text
            elif int(el.attrib['left']) < 169: data['Incident Time'] = el.text
            elif int(el.attrib['left']) < 647: data['Neighborhood'] = el.text
            elif int(el.attrib['left']) < 338: data['Description'] = el.text
            else:
                data['Zone'] = el.text
                print data

我究竟做错了什么？

任何更好的解决方案的建议也将不胜感激。

score 1 · Accepted Answer

除非您跳过了一些代码，否则只有在此行中的条件匹配时data才会创建您的字典：

if int(el.attrib['left']) < 11: data = { 'Report Name': el.text }

您设置值的所有其他行都data取决于它已经存在，因此NameError如果第一个条件不匹配，您将得到。

快速解决方法是始终创建一个空数据字典，例如

for page in pages[0:1]:
    for el in page:
        data = {}
        if el.tag =="text":

等等

python - 使用 ScraperWiki 抓取 PDF 并获得未定义的错误

1 回答 1

Related

Reference