-1

我有一个包含多个表格的页面。我正在尝试获取一个名为“TabBox”的表格,但它似乎正在抓取名为“TabBox2”的表格。有任何想法吗?

有一个包含两个表的“TabBox2”。似乎它正在搜索“TabBox”的第一个实例,而不管它被命名为“TabBox2”还是只是“TabBox”。

table = soup.find("table", { "class" : "GroupBox3" })
rows = table.find_all("tr")

table2 = soup.find("table", { "class" : "TabBox" })
rows2 = table.find_all("tr")

rows2 应该 = table2.find

谢谢游戏 Braniac!

       <br />
       <table cellspacing="0" cellpadding="4" border="1" class="GroupBox1">
          <tbody><tr>
            <th><h3>Completion Information</h3></th>
          </tr>
          <tr>
            <td><table width="578" cellspacing="0" cellpadding="4" border="1" class="GroupBox3">
              <tbody><tr>
                <th width="31%">Well Status Code</th>
                <th width="17%" nowrap="nowrap"><div align="center"><strong>Spud Date</strong></div></th>
                <th width="28%" nowrap="nowrap"><div align="center">Drilling Completed</div></th>
                <th width="24%" nowrap="nowrap"><div align="center">Surface Casing Date</div></th>
              </tr>
              <tr>
                <td nowrap="nowrap">W - Final Completion</td>
                <td><div align="center">12/08/2011</div></td>
                <td><div align="center">02/14/2012</div></td>
                <td><div align="center">12/09/2011</div></td>
              </tr>
            </tbody></table></td>
          </tr>

          <tr>
            <td><table cellspacing="0" cellpadding="4" border="1" class="TabBox">
              <tbody><tr>
                <th width="155" nowrap="nowrap">Field Name</th>
                <th width="142" nowrap="nowrap">Completed Well Type</th>
                <th width="108" nowrap="nowrap"><div align="center">Completed Date</div></th>
                <th width="133" nowrap="nowrap"><div align="center">Validated Date</div></th>
              </tr>

               <tr>
                <td nowrap="nowrap">
                   WOLFBONE (TREND AREA)
                </td>
                <td nowrap="nowrap"><div align="center">Oil</div>
                </td>
                <td nowrap="nowrap"><div align="center">02/14/2012</div>
                </td>
                <td nowrap="nowrap"><div align="center">06/04/2013</div>
                </td>
               </tr>

            </tbody></table>
           </td>
          </tr>

        </tbody></table>
       <br />
4

1 回答 1

1

尝试以下操作:

from bs4 import BeautifulSoup
import re

html = r"""
      <br />
       <table cellspacing="0" cellpadding="4" border="1" class="GroupBox1">
          <tbody><tr>
            <th><h3>Completion Information</h3></th>
          </tr>
          <tr>
            <td><table width="578" cellspacing="0" cellpadding="4" border="1" class="GroupBox3">
              <tbody><tr>
                <th width="31%">Well Status Code</th>
                <th width="17%" nowrap="nowrap"><div align="center"><strong>Spud Date</strong></div></th>
                <th width="28%" nowrap="nowrap"><div align="center">Drilling Completed</div></th>
                <th width="24%" nowrap="nowrap"><div align="center">Surface Casing Date</div></th>
              </tr>
              <tr>
                <td nowrap="nowrap">W - Final Completion</td>
                <td><div align="center">12/08/2011</div></td>
                <td><div align="center">02/14/2012</div></td>
                <td><div align="center">12/09/2011</div></td>
              </tr>
            </tbody></table></td>
          </tr>

          <tr>
            <td><table cellspacing="0" cellpadding="4" border="1" class="TabBox">
              <tbody><tr>
                <th width="155" nowrap="nowrap">Field Name</th>
                <th width="142" nowrap="nowrap">Completed Well Type</th>
                <th width="108" nowrap="nowrap"><div align="center">Completed Date</div></th>
                <th width="133" nowrap="nowrap"><div align="center">Validated Date</div></th>
              </tr>

               <tr>
                <td nowrap="nowrap">
                   WOLFBONE (TREND AREA)
                </td>
                <td nowrap="nowrap"><div align="center">Oil</div>
                </td>
                <td nowrap="nowrap"><div align="center">02/14/2012</div>
                </td>
                <td nowrap="nowrap"><div align="center">06/04/2013</div>
                </td>
               </tr>

            </tbody></table>
           </td>
          </tr>

        </tbody></table>
       <br />
"""
soup = BeautifulSoup(html)
tab_box = soup.findAll('table', {'class': 'TabBox'})

for var in tab_box:
    print var
于 2013-10-29T22:07:17.130 回答