-1

我想清理以下页面http://209.105.250.69:8382/以获取使用 Python 的侦听器数量

<td>Current Listeners:</td>
<td class="streamdata">28</td>

这是网站上的代码

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Icecast Streaming Media Server</title>
<link rel="stylesheet" type="text/css" href="style.css">
</head>
<body topmargin="0" leftmargin="0" rightmargin="0" bottommargin="0">
<h2>Icecast2 Status</h2>
<br><div class="roundcont">
<div class="roundtop"><img src="/corner_topleft.jpg" class="corner" style="display: none"></div>
<table border="0" width="100%" id="table1" cellspacing="0" cellpadding="4"><tr><td bgcolor="#656565">
<a class="nav" href="admin/">Administration</a><a class="nav" href="status.xsl">Server Status</a><a class="nav" href="server_version.xsl">Version</a>
</td></tr></table>
<div class="roundbottom"><img src="/corner_bottomleft.jpg" class="corner" style="display: none"></div>
</div>
<br><br><div class="roundcont">
<div class="roundtop"><img src="/corner_topleft.jpg" class="corner" style="display: none"></div>
<div class="newscontent">
<div class="streamheader"><table cellspacing="0" cellpadding="0">
<colgroup align="left"></colgroup>
<colgroup align="right" width="300"></colgroup>
<tr>
<td><h3>Mount Point /listen.mp3</h3></td>
<td align="right">
<a href="/listen.mp3.m3u">M3U</a><a href="/listen.mp3.xspf">XSPF</a>
</td>
</tr>
</table></div>
<table border="0" cellpadding="4">
<tr>
<td>Stream Title:</td>
<td class="streamdata">Quran Kareem Radio</td>
</tr>
<tr>
<td>Stream Description:</td>
<td class="streamdata">Quran Kareem Radio</td>
</tr>
<tr>
<td>Content Type:</td>
<td class="streamdata">audio/mpeg</td>
</tr>
<tr>
<td>Mount started:</td>
<td class="streamdata">Wed, 17 Jul 2013 05:40:46 -0400</td>
</tr>
<tr>
<td>Bitrate:</td>
<td class="streamdata">60</td>
</tr>
<tr>
<td>Current Listeners:</td>
<td class="streamdata">28</td>
</tr>
<tr>
<td>Peak Listeners:</td>
<td class="streamdata">202</td>
</tr>
<tr>
<td>Stream Genre:</td>
<td class="streamdata">Islam</td>
</tr>
<tr>
<td>Stream URL:</td>
<td class="streamdata"><a target="_blank" href="http://qkradio.com.au">http://qkradio.com.au</a></td>
</tr>
<tr>
<td>Current Song:</td>
<td class="streamdata"></td>
</tr>
</table>
</div>
<div class="roundbottom"><img src="/corner_bottomleft.jpg" class="corner" style="display: none"></div>
</div>
<br><br>&nbsp;


<div class="poster">Support icecast development at <a class="nav" target="_blank" href="http://www.icecast.org">www.icecast.org</a>
</div>
</body>
</html>
4

2 回答 2

2
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(s)
>>> td1 = soup.find('td', text='Current Listeners:')
>>> td2 = td1.find_next_sibling('td')
>>> td2.text
'28'
>>> 
于 2013-07-22T11:12:04.853 回答
2

您需要使用像BeautifulSoup这样的 HTML 解析器。我不会发布完整的解决方案(因为看起来你没有尝试做任何事情),但这里有一个演示:

from bs4 import BeautifulSoup as BS
html = the_above
soup = BS(html)
print soup.find_all('tr')

这将打印<tr>代码中的每个标签(作为列表)

于 2013-07-22T11:01:47.263 回答