我需要你的帮助来解决一个我找不到的问题...
我有一个带有 tr 和 td 的 html 表:
例如:
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
</td>
</tr>
<tr>
<td colspan="2">
<br />
<h2>
Macros
</h2>
</td>
</tr>
<tr>
<td>
#define
</td>
<td>
<a class="el" href="#g3e3da223d2db3b49a9b6e3ee6f49f745">
SND_LSTINDIC
</a>
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
liste sons indication
<br />
</td>
</tr>
<tr>
<td colspan="2">
<br />
<h2>
Définition de type
</h2>
</td>
</tr>
<tr>
<td class="memItemLeft" nowrap="nowrap" align="right" valign="top">
typedef void(*
</td>
<td class="memItemRight" valign="bottom">
<a class="el" href="#g73cba8bd62d629eb05495a5c1a7b2844">
f_sndChangeFunc
</a>
)(
<a class="el" href="#g4ab7db37a42f244764583a63997489a8">
e_sndSound
</a>
i_eSound,
aBOOL
i_bStart,
aBYTE
i_byDisableModule)
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
Fonction rappel sur départ/arrêt bip.
<a href="#g73cba8bd62d629eb05495a5c1a7b2844">
</a>
<br />
</td>
</tr>
<tr>
<td colspan="2">
<br />
<h2>
Énumérations
</h2>
</td>
</tr>
<tr>
<td class="memItemLeft" nowrap="nowrap" align="right" valign="top">
enum
</td>
<td class="memItemRight" valign="bottom">
<a class="el" href="#g4ab7db37a42f244764583a63997489a8">
e_sndSound
</a>
{
}
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
identificateurs sons
<a href="group__Sound.html#g4ab7db37a42f244764583a63997489a8">
Plus de détails...
</a>
<br />
</td>
</tr>
</table>
我试着把这张桌子分成几张。我想出去
标题并使用以下行创建一个表。例如,这里的预期结果应该是这样的:
<h2>
Macros
</h2>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
</td>
</tr>
<tr>
<td colspan="2">
<br />
</td>
</tr>
<tr>
<td>
#define
</td>
<td>
<a class="el" href="#g3e3da223d2db3b49a9b6e3ee6f49f745">
SND_LSTINDIC
</a>
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
liste sons indication
<br />
</td>
</tr>
</table>
<h2>
Définition de type
</h2>
<table>
<tr>
<td class="memItemLeft" nowrap="nowrap" align="right" valign="top">
typedef void(*
</td>
<td class="memItemRight" valign="bottom">
<a class="el" href="#g73cba8bd62d629eb05495a5c1a7b2844">
f_sndChangeFunc
</a>
)(
<a class="el" href="#g4ab7db37a42f244764583a63997489a8">
e_sndSound
</a>
i_eSound,
aBOOL
i_bStart,
aBYTE
i_byDisableModule)
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
Fonction rappel sur départ/arrêt bip.
<a href="#g73cba8bd62d629eb05495a5c1a7b2844">
</a>
<br />
</td>
</tr>
</table>
<h2>
Énumérations
</h2>
<table>
<tr>
<td class="memItemLeft" nowrap="nowrap" align="right" valign="top">
enum
</td>
<td class="memItemRight" valign="bottom">
<a class="el" href="#g4ab7db37a42f244764583a63997489a8">
e_sndSound
</a>
{
}
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
identificateurs sons
<a href="group__Sound.html#g4ab7db37a42f244764583a63997489a8">
Plus de détails...
</a>
<br />
</td>
</tr>
</table>
我使用 python 和 BeautifulSoup 来解析我的 html 代码。我首先尝试了这个:
from BeautifulSoup import BeautifulSoup, NavigableString
import sys
import os
soup = BeautifulSoup(allHtml)
for table in htmlSoup.findAll("table"):
h2s = table.findAll("h2")
if h2s is not []:
FirstH2 = True
LastH2 = False
for i, h2 in enumerate(h2s):
if h2 is not []:
LastH2 = ( i == len(h2s) - 1 )
h2.parent.replaceWithChildren() # <td> deleted
h2.parent.replaceWithChildren() # <tr> deleted
print h2.parent
if FirstH2:
h2.replaceWith( h2.prettify() + '<table>' )
#h2_tag_idx = h2.parent.contents.index(h2) # other method to add Tags
#h2.parent.insert(h2_tag_idx + 1, '<b>OK</b>')
else:
h2.replaceWith( '</table>' + h2.prettify() + '<table>' )
FirstH2 = False
print soup.prettify()
但没办法,它用 HTML 等价的 ASCII 代码替换我的标签......
我还尝试获取表中的所有内容,然后尝试重建几张表,然后将其再次放入汤中,但失败了...
我还尝试在字符串中获取表格并使用分隔符拆分字符串并将所有子表放入汤中,但它也失败了......
如果有人有想法,那就太好了!
提前致谢!