1

我是 Python 新手,有人可以帮我编写可以从标签中获取数据的代码吗?

下面是表格标签,我还需要第一个表格内的表格中存在的数据:

<table width="100%" border="0" cellspacing="0" cellpadding="0">
                  <tr> 
                    <td width="7" rowspan="2">&nbsp;</td>
                    <td width='40%'> <div align="left">

                      </div>
                    <td width="7" rowspan="2">&nbsp;</td>
                  </tr>
                  <tr> 
                    <td colspan="2"> 
    <b><font face='Arial, Helvetica, sans-serif' size='2'>Account #: 8428995632 </font></b><BR><TABLE BORDER='1' width='100%' align='center' cellspacing='0'><TR><td align='left' colspan='2'><font face='Arial, Helvetica, sans-serif' size='2'><b>Billing Date:   </b><BR>07-22-2013</font></TD><td align='left' ><font face='Arial, Helvetica, sans-serif' size='2'><b>Past Due Date:    </b><BR>08-12-2013</font></TD></TR><TR><td align='left'><font face='Arial, Helvetica, sans-serif' size='2'><b>Service From: </b><BR>06-11-2013</font></TD><td align='left'><font face='Arial, Helvetica, sans-serif' size='2'><b>Service To:    </b><BR>07-11-2013</font></TD><td align='left'><font face='Arial, Helvetica, sans-serif' size='2'><b>Days of Service: </b><BR>30</font></TD></TR><TR><td align='left' colspan='2'><font face='Arial, Helvetica, sans-serif' size='2'><b>Current Charges:    </b>$30,488.60</font></TD><td align='left' ><font face='Arial, Helvetica, sans-serif' size='2'><b>Amount Due:   </b>$30,488.60</font></TD></TR></TR></TABLE><p><p><p><p><CENTER><font face='Arial, Helvetica, sans-serif' size='3'><b> Meter readings for this bill:</b></font></CENTER><TABLE BORDER='1' width='100%' align='center' cellspacing='0'><TR bgcolor='#FFF2D7'><td align='center' width='18%'><font face='Arial,Helvetica,  sans-serif' size='2'><b>Meter</b></font></TD><td align='center' width='17%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Service<br>From</b></font></TD><td align='center' width='17%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Service<br>To</b></font></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'><b># Days</b></font></TD><td align='center' width='10%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Prior<br>Read</b></font></TD><td align='center' width='10%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Current<br>Read</b></font></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'><b>Consumption</b></font></TD><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>S10406906</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>134</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>144</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>10</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>08400002</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>30748</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>32634</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>1886</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>S10406911</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>2717</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>3046</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>329</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>08405704</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>23755</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>25100</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>1345</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>S10406895</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>97</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>101</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>4</FONT></TD></TR></FONT><TR><td align='center' width='8%'><font face='Arial,Helvetica,sans-serif' size='2'>S10406893</FONT></TD><td align='center' width='18%'><font face='Arial, Helvetica, sans-serif' size='2'>06-11-2013</FONT></TD><td align='center' width='12%'><font face='Arial, Helvetica, sans-serif' size='2'>07-11-2013</FONT></TD><td align='center' width='8%'><font face='Arial, Helvetica, sans-serif' size='2'>30</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>7915</FONT></TD><td align='center' width='22%'><font face='Arial, Helvetica, sans-serif' size='2'>8406</FONT></TD><td align='center' width='16%'><font face='Arial, Helvetica, sans-serif' size='2'>491</FONT></TD></TR></FONT></TABLE><input type='hidden' name='BillId' value='842892230704'></form>

                    </td>
                  </tr>
                </table>

我使用的代码是:

print BS('table')[0].text

但这仅获得第一个表格内容。

谢谢您的帮助。

4

1 回答 1

0

不太确定 OP 在这里要求什么,但这就是我无法回答的问题:

下面是表格标签,我还需要第一个表格内的表格中存在的数据:

从这句话中,我怀疑你想要外层中的一行文本<table>,以及第一层内层 <table>中的文本。有很多方法可以解决这个问题BeautifulSoup,但这种方式对我来说最有意义。

# The variable "html" contains your sample html.
font_tags = html.findAll( 'font' )

# Now we print each piece of data wrapped in a <font> tag
for font_tag in font_tags:
    # This begins the second inner table, and we don't want that.
    if font_tag.text == u" Meter readings for this bill:":
        break
    else:
        print font_tag.text

这将打印以下内容:

Account #: 8428995632 
Billing Date:   07-22-2013
Past Due Date:    08-12-2013
Service From: 06-11-2013
Service To:    07-11-2013
Days of Service: 30
Current Charges:    $30,488.60
Amount Due:   $30,488.60
于 2013-09-12T12:59:29.517 回答