python - 无法在 BeautifulSoup 中访问孩子的属性

Question

我正在尝试将 HTML 表转换为 2d python 列表（列表列表）。其中三个“列”只是相应 HTML 表格单元格的文本，可以正常工作。但是，一个“列”应该只是相应 HTML 单元格中链接的 ID，但我无法访问该属性。

当我尝试获取链接的 ID 时，就会出现问题。如果我打印该元素的 .contents，它只会显示“Action”。当我尝试访问该元素的 ['id'] 索引时，它给了我一个错误。怎么了？

    bs = BeautifulSoup(page)

    table = bs.find("table", id="ctl00_ContentPlaceHolder1_Name_Reports1_TabContainer1_TabPanel1_dgReports")

    def notHeader(css_class):
        return css_class is not "gridviewheader"

    rows = table.find_all("tr", class_=notHeader)

    result = []

    for x in range(0, len(rows)):
        allcols = rows[x].findAll('td')

        tempRow = []
        print(allcols[0].contents[0])  #only prints Action
        tempRow.append(allcols[0].contents[0]['id'])  #TypeError: string indices must be integers
        tempRow.append(allcols[2].string)
        tempRow.append(allcols[3].string)
        tempRow.append(allcols[5].string)
        amended = -1
        for existing in result:
            if tempRow[1] == existing[1] and tempRow[2] == existing[2]:
                amended = 1
        if amended == -1:
            result.append(tempRow)

    print (ids)

score 0 · Accepted Answer

想通了：它与使用 find_all() 中的函数来消除标题行有关。我将 find_all 行替换为

rows = table.find_all("tr")[1:]

因为标题始终是第一行，并且有效。

python - 无法在 BeautifulSoup 中访问孩子的属性

1 回答 1

Related

Reference