我知道有些模块可以完全简化此功能,但说我是从 python 的基本安装运行的(仅限标准模块),我将如何提取以下内容:
我有一个清单。此列表是网页的逐行内容。这是一个用于提供信息的模型列表(未格式化):
<script>
link = "/scripts/playlists/1/" + a.id + "/0-5417069212.asx";
<script>
"<a href="/apps/audio/?feedId=11065"><span class="px13">Eastern Metro Area Fire</span>"
从上面的字符串中,我需要提取以下内容。feedId (11065),在上面的代码中顺便说一下 a.id。“/scripts/playlists/1/”和“/0-5417069212.asx”。记住这些行中的每一行只是列表中对象的内容,我将如何提取这些数据?
以下是完整列表:
contents = urllib2.urlopen("http://www.radioreference.com/apps/audio/?ctid=5586")
伪:
from urllib2 import urlopen as getpage
page_contents = getpage("http://www.radioreference.com/apps/audio/?ctid=5586")
feedID = % in (page_contents.search() for "/apps/audio/?feedId=%")
titleID = % in (page_contents.search() for "<span class="px13">%</span>")
playlistID = % in (page_contents.search() for "link = "%" + a.id + "*.asx";")
asxID = * in (page_contents.search() for "link = "*" + a.id + "%.asx";")
streamURL = "http://www.radioreference.com/" + playlistID + feedID + asxID + ".asx"
我计划将其格式化为 streamURL 应该 = :
http://www.radioreference.com/scripts/playlists/1/11065/0-5417067072.asx