1

我在使用 BS4 解析这个 HTML 表时遇到了困难。有时该页面没有付款数据,并且会显示“没有待处理的清单付款”。其他时候,该页面将列出所有到期的未决付款。我想将此数据输出到一个数组中。

def find_payment(html):
    soup = BeautifulSoup(html)
    table = soup.find('table', cellspacing="0", cellpadding="2", border="0")
    table_body = table.find('tbody')
    rows = table.body.find_all('tr')
    payment_data = []
    for row in rows:
        cols = row.find_all('td')
        cols = [ele.text.strip() for ele in cols]
        account_data.append([ele for ele in cols if ele])
    return payment_data

在此处输入图像描述

在此处输入图像描述

在大多数情况下,解决了。我做了这样的事情:

def find_payment(html):
    soup = BeautifulSoup(html)
    if soup.find(text="There is no pending manifest payment") is not None:
        payment_data.append([0, ID[i]])
    else:
        amount = soup.find('td', {'class': 'bodytext'}, width="35%")
        payment_data.append([amount.text, ID[i]])
    return payment_data
4

2 回答 2

1

为什么不直接查找带有successbody10类的“td” ?

 def find_payments(html):    
        soup = BeautifulSoup(html)
        if soup.find("td", {"class":"success"}):
            payments = "There is no pending manifest payment"
        else:
            payments = [pmnt.text for pmnt in soup.findAll("td", {"class":"body10"})]
于 2016-03-18T21:29:37.390 回答
0

一种选择是有点防御性(某种LBYL风格)并事先搜索“没有待处理的清单付款”元素:

if soup.find(text="There is no pending manifest payment") is not None:
    print("No payment data")
于 2016-03-18T21:13:40.603 回答