python - python代码返回无类型对象有时没有属性错误，而其他时间则完美运行

Question

def dcrawl(link):
    #importing the req. libraries & modules
    from bs4 import BeautifulSoup
    import urllib

    #fetching the document
    op = urllib.FancyURLopener({})
    f = op.open(link)
    h_doc = f.read()

    #trimming for the base document
    idoc1 = BeautifulSoup(h_doc)
    idoc2 = str(idoc1.find(id = "bwStory"))
    bdoc = BeautifulSoup(idoc2)

    #extract the date as a string
    dat = str(bdoc.div.div.string)[0:13]
    date = dst(dat)

    #extract the title as a string
    title = str(bdoc.b.string)
    #extract the full report as a string
    freport = str(bdoc.find_all("p"))

    #extract the place as a string
    plc = bdoc.find(id = "bwStoryBody")
    puni = plc.p.string

    #encoding to ascii to eliminate discrepancies
    pasi = puni.encode('ascii', 'ignore')
    com = pasi.find("-")
    place = pasi[:com]

相同的转换“bdoc.b.string”在这里有效：

#extract the full report as a string
freport = str(bdoc.find_all("p"))

在行中：

plc = bdoc.find(id = "bwStoryBody")

plc返回一些数据。并plc.p返回第一个<p>....<p>，但将其转换为字符串不起作用。

因为puni之前返回了一个字符串对象，我偶然发现了 unicode 错误，因此不得不使用编码来处理pasi结果。

score 0 · Accepted Answer

.find()未找到None对象时返回。显然有些页面没有您要查找的元素。

如果要防止属性错误，请显式测试它：

plc = bdoc.find(id = "bwStoryBody")
if plc is not None:
    puni = plc.p.string
    #encoding to ascii to eliminate discrepancies
    #By default python processes in unicode
    pasi = puni.encode('ascii', 'ignore')
    com = pasi.find("-")
    place = pasi[:com]

python - python代码返回无类型对象有时没有属性错误，而其他时间则完美运行

1 回答 1

Related

Reference