嗨,我有一个正则表达式
<a href="(.+?)" class="nextpostslink">
此正则表达式适用于以下 html
'>
<span class='pages'>Page 1 of 12</span><span class='current'>1</span><a href='http://cinemassacre.com/category/avgn/page/2/' class='page larger'>2</a><a href='http://cinemassacre.com/category/avgn/page/3/' class='page larger'>3</a><a href='http://cinemassacre.com/category/avgn/page/4/' class='page larger'>4</a><a href='http://cinemassacre.com/category/avgn/page/5/' class='page larger'>5</a><a href="http://cinemassacre.com/category/avgn/page/2/" class="nextpostslink">»</a><span class='extend'>...</span><a href='http://cinemassacre.com/category/avgn/page/12/' class='last'>Last »</a>
</div> </div>
我试图提取的部分是下一页的网址
<a href="http://cinemassacre.com/category/avgn/page/2/" class="nextpostslink">
但是当我在这个 HTML 块上运行这个正则表达式时
'>
<span class='pages'>Page 2 of 12</span><a href="http://cinemassacre.com/category/avgn/" class="previouspostslink">«</a><a href='http://cinemassacre.com/category/avgn/' class='page smaller'>1</a><span class='current'>2</span><a href='http://cinemassacre.com/category/avgn/page/3/' class='page larger'>3</a><a href='http://cinemassacre.com/category/avgn/page/4/' class='page larger'>4</a><a href='http://cinemassacre.com/category/avgn/page/5/' class='page larger'>5</a><a href="http://cinemassacre.com/category/avgn/page/3/" class="nextpostslink">»</a><span class='extend'>...</span><a href='http://cinemassacre.com/category/avgn/page/12/' class='last'>Last »</a>
</div>
</div>
它提取从第一个<a href="
到" class="nextpostslink">
为什么会发生这种情况的所有内容?我认为 (.+?) 是非贪婪的,所以它应该提取最少的数量。
哪个应该是<a href="http://cinemassacre.com/category/avgn/page/3/" class="nextpostslink">
我使用的完整python代码是
match=re.compile('<a href="(.+?)" class="nextpostslink">', re.DOTALL).findall(pagenav)