0

这只是代码的一部分。它不会替换 div 和 href 的值。这是一个美丽的汤类标签

soup = BeautifulSoup(ourUrl)
dem = soup.findAll('p')
for i in range(0,len(dem)-1):
              dk = dem[i]


              if ('<div') in dk:
                   print "it here"
                   dk =dk.replace('<div','<!--')
                   dk =dk.replace('</div>','--->')
                   dem[i] = dk
for i in range(0,len(dem)-1):
              dk = dem[i]
              if ('<a href') in dk:
                   print "it here"
                   dk =dk.replace('<a href','<!--')
                   dk =dk.replace('</a>','--->')
                   dem[i] = dk

dem 值类似于:

dem =[    <p class="left-text padding-left-10">
<a href="/people" class="red-text">See all people</a>
</p>
<p class="left-text padding-left-10">
<a href="/tv" class="red-text" style="display:inline;">See all bio TV</a>
<span class="divider">&nbsp;|&nbsp;&nbsp;</span>
<a href="/tv/daily-schedule" class="red-text" style="display:inline;">See schedule </a>
</p>
<p class="left-text bottom-flyout-video-padding">
<a href="/videos" class="red-text ">See all videos</a>
</p>
<p class="left-text padding-left-10">
<a href="http://shop.history.com/?v=biography" class="red-text">Shop now</a>
</p>
<p>TV14 </p>
<p>He rose from the slums of Brooklyn to take on the biggest Mafia dons of the 1950s and 1960s. Joey Gallo began his criminal career as a small-time loan shark and jukebox racketeer. He became a top enforcer in the Profaci crime family, but felt he never got the respect he deserved. So Gallo formed his own gang and revolted against mafia Don Joe Profaci in a long, bloody war on the streets of New York. But there was another side to Joey Gallo--the ruthless mob leader was also an artist and an avid reader. Living in Greenwich Village with his wife Jeffie, Gallo was inspired by his beatnik neighbors and their counterculture ideas. He also began hobnobbing with New York's social elite, befriending everyone from Neil Simon to Jerry Orbach. In the end though, nothing could save Joey Gallo from a dramatic end.</p>
<p>TV14 </p>
<p>
<p> Charles Darwin, <a href="/people/charles-darwin-9266433">http://www.biography.com/people/charles-darwin-9266433</a> (last visited Aug 27, 2013).</p>
<p> Charles Darwin. The Biography Channel website. 2013. Available at: <a href="/people/charles-darwin-9266433">http://www.biography.com/people/charles-darwin-9266433</a>. Accessed Aug 27, 2013. </p>
<p>Naturalist Charles Darwin was born in Shrewsbury, England, on February 12, 1809. In 1831, he embarked on a five-year survey voyage around the world on the HMS <i>Beagle</i>. His studies of specimens around the globe led him to formulate his theory of evolution and his views on the process of natural selection. In 1859, he published <i>On the Origin of Species</i>. He died on April 19, 1882, in London.</p>
<p><span class="body">A man who dares to waste one hour of time has not discovered the value of life.</span></p>


                            571 people in this group<br />
</p>]

dem 值太大而无法输入,所以我给了你一个摘录。即使有

4

1 回答 1

0

如果您想用包含被替换标签的注释替换元素,请将对象替换为新bs4.Comment()对象:

from bs4 import Comment

for para in soup.find_all('p'):
    for div in para.find_all('div'):
        div.replace_with(Comment(unicode(div)))
    for link in para.find_all('a', href=True):
        link.replace_with(Comment(unicode(link)))

在 Python 中,不要使用带有 的for循环,而是直接range()在序列上循环;在上面的代码中,我直接遍历结果。.find_all()

BeautifulSoup 元素可能打印出来好像它们只是 HTML 文本,但实际上它们不是字符串而是Tag()对象。不要试图将它们视为字符串。

于 2013-08-27T11:52:09.923 回答