如何生成 HTML 文档中文本字符串的所有路径,最好使用 BeautifulSoup?我有这个代码:
<DIV class="art-info"><SPAN class="time"><SPAN class="time-date" content="2012-02-28T14:46CET" itemprop="datePublished">
28. february 2012
</SPAN>
14:46
</SPAN></DIV><DIV>
Something,<P>something else</P>continuing.
</DIV>
我想将 HTML 代码划分为文本字符串的路径,例如
str1 >>> <DIV class="art-info"><SPAN class="time"><SPAN class="time-date" content="2012-02-28T14:46CET" itemprop="datePublished">28. february 2012</SPAN></SPAN></DIV>
str2 >>> <DIV class="art-info"><SPAN class="time">14:46</SPAN></DIV>
str3 >>> <DIV>Something,continuing.</DIV>
str4 >>> <DIV><P>something else</P></DIV>
或者
str1 >>> <DIV><SPAN><SPAN>28. february 2012</SPAN></SPAN></DIV>
str2 >>> <DIV><SPAN>14:46</SPAN></DIV>
str3 >>> <DIV>Something,continuing.</DIV>
str4 >>> <DIV><P>something else</P></DIV>
或者
str1 >>> //div/span/span/28. february
str2 >>> //div/span/14:46
str3 >>> //div/Something,continuing.
str4 >>> //div/p/something else
我研究了 BeautifulSoup 文档,但我不知道该怎么做。你有什么想法?