我有一个带有下一个代码的页面:
<HTML>
<HEAD>
<TITLE>smth</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
</HEAD>
<BODY>
<div id="doc" class="searchN">
<div id="hd" style="border-bottom:0;">
<a id="logo" class="logoN" href="/" alt="logo" title="open project"></a>
</div>
<div id="bd-cross">
<ol class="site" start=1>
<li class="">
<a href="url/">Smth</a>
<div class="ref">
<a href="News_and_Media/">Regional: Europe:</a>
</div>
</li>
<li class="">
<a href="url2">Descr3</a>
<div class="ref">
<a href="url3">Descr3</a>
</div>
</li>
....
</BODY>
</HTML>
我需要检查<li class="">
页面上的标签存在。我使用 Python+RegExp:
import re
import urllib2
url = 'url'
#Parse it
MainPage = urllib2.urlopen(url).read()
Li = re.findall("<div id=\"bd-cross\">*<li class=\"\">*</li>", MainPage)
try:
if Li:
print "Li tag on " +url+ ": Yes"
else:
print "Li tag on " +url+ ": No"
except:
print "Error"
输出是 No 但它应该是 Yes 因为页面包含它标签。如果我打印 Li 它输出'[]'。