我正在尝试使用 beautifulsoup 解析一个简单的 html 表,但我遇到了一些问题
这是我的输入
<table id="people" class="tt" width="99%" border="0" cellpadding="0" cellspacing="1">
<tr>
<td colspan="3" bgcolor="#d3d3d3">
<p align="center" style="border: 1px solid #c0c0c0; padding: 0.02in">
<a name="faculty">
</a>
<b>
Faculty
</b>
</p>
</td>
</tr>
<tr>
<td>
<p align="center">
<font color="#000080">
<a href="http://www.website.com/%7Empop">
<font color="#000080">
<img src="images/mpop.jpg" name="graphics1" align="bottom" width="70" height="85" border="1" />
</font>
</a>
</font>
</p>
</td>
<td>
<p>
<b>
John Doe, Ph.D.
</b>
<br />
Associate Professor, Computer
Science
<br />
</p>
</td>
<td>
<p>
Office: Sciences Bldg.
<br />
Phone:
xxx-xxx-xxxx
<br />
jd [at] website.com
<br />
</p>
</td>
</tr>
<tr>
<td>
<p align="center">
<font color="#000080">
<a href="http://www.website.com/%7Ercolwell">
<font color="#000080">
<img src="images/rcolwell.jpg" name="graphics2" align="bottom" width="70" height="97" border="1" />
</font>
</a>
</font>
</p>
</td>
<td>
<p>
<b>
Jane Doe, Ph.D.
</b>
<br />
Professor
<br />
School of Public Health
<br />
</p>
</td>
<td>
<p>
Sciences Bldg
<br />
jd [at]
website.com
<br />
</a>
</p>
</td>
</tr>
</table>
这是我的代码
t = soup.findAll("table",id="people")
for table in t:
rows = table.findAll("tr")
for tr in rows:
cols = tr.findAll("td")
for td in cols:
print(str(td.find(text=True))) # tried also print(td.find(text=True))
print(",")
print("\n")
这将生成只有逗号而实际上没有文本的输出,但是当我print(td)
找到我需要输出的信息但以 html 格式输出所有标签时,谁能指出我在这里做正确的事情?我只想提取单元格内容。
干杯