我正在尝试匹配以下 HTML (file.txt) 中的 TH 标签:
<TABLE WIDTH="71%" BORDER=0 CELLSPACING=0 CELLPADDING=0>
<TR VALIGN="BOTTOM">
<TH WIDTH="34%" ALIGN="LEFT"><FONT SIZE=1><B>Name<BR> </B></FONT><HR NOSHADE></TH>
<TH WIDTH="3%"><FONT SIZE=1> </FONT></TH>
<TH WIDTH="5%" ALIGN="CENTER"><FONT SIZE=1><B>Age</B></FONT><HR NOSHADE></TH>
<TH WIDTH="3%"><FONT SIZE=1> </FONT></TH>
<TH WIDTH="55%" ALIGN="CENTER"><FONT SIZE=1><B>Positions</B></FONT><HR NOSHADE></TH>
</TR>
<TR BGCOLOR="#CCEEFF" VALIGN="TOP">
<TD WIDTH="34%"><FONT SIZE=2>Stephen A. Wynn</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="5%" ALIGN="CENTER"><FONT SIZE=2>60</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="55%"><FONT SIZE=2>Chairman of the Board and Chief Executive Officer</FONT></TD>
</TR>
<TR BGCOLOR="White" VALIGN="TOP">
<TD WIDTH="34%"><FONT SIZE=2>Kazuo Okada</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="5%" ALIGN="CENTER"><FONT SIZE=2>60</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="55%"><FONT SIZE=2>Vice Chairman of the Board</FONT></TD>
</TR>
</TABLE>
我尝试了以下方法,但似乎不起作用:
from bs4 import BeautifulSoup
infile = open("file.txt")
soup = BeautifulSoup(infile.read())
#this works
soup.findAll('th')
#this works but isn't particularly useful...
soup.findAll(text="Age")
#this is what I really want, but it returns an empty list
soup.findAll('th', text="Age")
谢谢您的帮助!