我正在尝试使用 beautifulsoup 解析表格。我页面上的第一个很简单,但我无法解析同一页面上的类似表格。我不懂为什么。
这是代码。在此先感谢您的帮助。
import urllib2
from bs4 import BeautifulSoup
url = urllib2.urlopen("https://dl.dropboxusercontent.com/u/956261/poftext.html")
contentHTML = url.read()
soup = BeautifulSoup(contentHTML)
tableUserDetails = soup.find("table", {"class" : "user-details"})
i = 0
tableUserDetailsList = []
for row in tableUserDetails.findAll('tr'):
for col in row.findAll('td'):
contentTd = col.contents[0].string.strip()
if contentTd:
print "TD Number %d : %s" % (i, contentTd)
tableUserDetailsList.append(contentTd)
i += 1
# This first table is OK
print tableUserDetailsList
# But now this one
tableUserDetails = soup.find("table", {"class" : "secondpart"})
i = 0
tableUserDetailsList = []
for row in tableUserDetails.findAll('tr'):
for col in row.findAll('td'):
contentTd = col.contents[0].string.strip()
if contentTd:
print "TD Number %d : %s" % (i, contentTd)
tableUserDetailsList.append(contentTd)
i += 1
print tableUserDetailsList
# The list is empty :(
这是我试图解析的 HTML 代码的简化版本:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
French.Kiss
Sorties, Sport, Voyages, Nouvelles Expériences</title>
</head>
<body style='background-color: #fff;' leftMargin='0' topMargin='0' marginwidth='0' marginheight='0' link='#1E55D6' vlink='#1E55D6' TEXT='#6551b0'>
<table class="user-details">
<tr>
<td class="headline txtBlue size15" style="width:80px">
About
</td>
<td style="width:10px">
</td>
<td class="txtGrey size15">
Fume occasionnellement with Silhouette mince
</td>
<td width="25px;">
</td>
<td class="headline txtBlue size15">
City
</td>
<td class="txtGrey size15">
Paris Ile-de-France
</td>
</tr>
<tr>
<td class="headline txtBlue size15">
Details
</td>
<td style="width:10px">
</td>
<td class="txtGrey size15">
26 year old Un homme, 185cm, Sans religion
</td>
<td>
</td>
<td class="headline txtBlue size15">
Ethnicity
</td>
<td class="txtGrey size15">
Caucasienne Balance with Châtains
</td>
</tr>
<tr>
<td class="headline txtBlue size15">
Intent
</td>
<td style="width:10px">
</td>
<td class="txtGrey size15">
French.Kiss Cherche une relation amoureuse.
</td>
<td>
</td>
<td class="headline txtBlue size15" style="width:90px">
Education
</td>
<td class="txtGrey size15">
Diplôme universitaire/Licence
</td>
</tr>
<tr>
<td class="headline txtBlue size15">
Personnalité
</td>
<td style="width:10px">
</td>
<td class="txtGrey size15">
</td> <td>
</td>
<td>
<span class="headline txtBlue size15">Profession </span>
</td>
<td>
<span class="txtGrey size15">
Visioconférence</span>
</td>
</tr>
</table>
<table width="85%" class="secondpart">
<tr height="25px">
<td width="200px">
<span class="headline txtBlue size14">I am Seeking a</span>
</td>
<td width="300px">
<span class="txtGrey size14">
Une femme</span>
</td>
<td width="25px">
</td>
<td width="200px">
<span class="headline txtBlue size14">For</span>
</td>
<td width="200px">
<span class="txtGrey size14">
Sorties</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14"><a href='needs_test.aspx'>Needs Test</a></span>
</td>
<td>
<span class="txtGrey size14"><a href='needs_test.aspx'>
<a href="needs_view.aspx?id=38028200">View
his
relationship needs</a></a></span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14"><a href='poftest.aspx'>Chemistry</a></span>
</td>
<td>
<span class="txtGrey size14"><a href='poftest.aspx'>
<a href="personality.aspx?id=26&user_id=41724176" rel="nofollow">View
his
chemistry results</a></a></span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Do you drink?</span>
</td>
<td>
<span class="txtGrey size14">
Occasionnellement</span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14">Do you want children?</span>
</td>
<td>
<span class="txtGrey size14">
Non divulgué</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Marital Status</span>
</td>
<td>
<span class="txtGrey size14">
Célibataire</span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14">Do you do drugs?</span>
</td>
<td>
<span class="txtGrey size14">
Non</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Pets </span>
</td>
<td>
<span class="txtGrey size14">
Aucun</span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14">Eye Color</span>
</td>
<td>
<span class="txtGrey size14">
Bruns</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Do you have a car? </span>
</td>
<td>
<span class="txtGrey size14">
N/A</span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14">Do you have children?</span>
</td>
<td>
<span class="txtGrey size14">
Non</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Longest Relationship</span>
</td>
<td>
<span class="txtGrey size14">
Plus de 2 ans</span>
</td>
<td>
</td>
<td>
</td>
<td>
</td>
</tr>
</table>
</body>
</html>
两个表的 tableUserDetails.content、tableUserDetails 和 tableUserDetailsList:
*第一张桌子*
打印 tableUserDetails.content = none
打印 tableUserDetails =
<table class="user-details">
<tr>
<td class="headline txtBlue size15" style="width:80px">
About
</td>
<td style="width:10px">
</td>
<td class="txtGrey size15">
Fume occasionnellement with Silhouette mince
</td>
<td width="25px;">
</td>
<td class="headline txtBlue size15">
City
</td>
<td class="txtGrey size15">
Paris Ile-de-France
</td>
</tr>
<tr>
<td class="headline txtBlue size15">
Details
</td>
<td style="width:10px">
</td>
<td class="txtGrey size15">
26 year old Un homme, 185cm, Sans religion
</td>
<td>
</td>
<td class="headline txtBlue size15">
Ethnicity
</td>
<td class="txtGrey size15">
Caucasienne Balance with Châtains
</td>
</tr>
<tr>
<td class="headline txtBlue size15">
Intent
</td>
<td style="width:10px">
</td>
<td class="txtGrey size15">
French.Kiss Cherche une relation amoureuse.
</td>
<td>
</td>
<td class="headline txtBlue size15" style="width:90px">
Education
</td>
<td class="txtGrey size15">
Diplôme universitaire/Licence
</td>
</tr>
<tr>
<td class="headline txtBlue size15">
Personnalité
</td>
<td style="width:10px">
</td>
<td class="txtGrey size15">
</td> <td>
</td>
<td>
<span class="headline txtBlue size15">Profession </span>
</td>
<td>
<span class="txtGrey size15">
Visioconférence</span>
</td>
</tr>
</table>
print tableUserDetailsList = [u'About', u'Fume chancenellement with Silhouette mince', u'City', u'Paris Ile-de-France', u'Details', u'26 岁 Un homme, 185cm, Sans 宗教', u'Ethnicity', u'Caucasienne Balance with Ch\xe2tains', u'Intent', u'French.Kiss Cherche une relationship amoureuse.', u'Education', u'Dipl\xf4me universitaire/Licence', u'Personnalit\xe9']
*第二张表*
打印 tableUserDetails.content = none
打印 tableUserDetails =
<table width="85%" class="secondpart">
<tr height="25px">
<td width="200px">
<span class="headline txtBlue size14">I am Seeking a</span>
</td>
<td width="300px">
<span class="txtGrey size14">
Une femme</span>
</td>
<td width="25px">
</td>
<td width="200px">
<span class="headline txtBlue size14">For</span>
</td>
<td width="200px">
<span class="txtGrey size14">
Sorties</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14"><a href='needs_test.aspx'>Needs Test</a></span>
</td>
<td>
<span class="txtGrey size14"><a href='needs_test.aspx'>
<a href="needs_view.aspx?id=38028200">View
his
relationship needs</a></a></span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14"><a href='poftest.aspx'>Chemistry</a></span>
</td>
<td>
<span class="txtGrey size14"><a href='poftest.aspx'>
<a href="personality.aspx?id=26&user_id=41724176" rel="nofollow">View
his
chemistry results</a></a></span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Do you drink?</span>
</td>
<td>
<span class="txtGrey size14">
Occasionnellement</span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14">Do you want children?</span>
</td>
<td>
<span class="txtGrey size14">
Non divulgué</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Marital Status</span>
</td>
<td>
<span class="txtGrey size14">
Célibataire</span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14">Do you do drugs?</span>
</td>
<td>
<span class="txtGrey size14">
Non</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Pets </span>
</td>
<td>
<span class="txtGrey size14">
Aucun</span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14">Eye Color</span>
</td>
<td>
<span class="txtGrey size14">
Bruns</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Do you have a car? </span>
</td>
<td>
<span class="txtGrey size14">
N/A</span>
</td>
<td>
</td>
<td>
<span class="headline txtBlue size14">Do you have children?</span>
</td>
<td>
<span class="txtGrey size14">
Non</span>
</td>
</tr>
<tr height="25px">
<td>
<span class="headline txtBlue size14">Longest Relationship</span>
</td>
<td>
<span class="txtGrey size14">
Plus de 2 ans</span>
</td>
<td>
</td>
<td>
</td>
<td>
</td>
</tr>
</table>
打印 tableUserDetailsList = []