python - 将正则表达式字符串与引号和 href 匹配

Question

我正在尝试使用正则表达式来匹配

  <a href = "something" >

在下面的字符串中，但没有打印。

E = '<a> test <a href> <a href = "something" ><a href="anything">'
H = re.match('^[<a href = ]\".\" >$' , E)
print (H)

score 1 · Accepted Answer

不要用正则表达式解析 html。

这是一个使用 BeautifulSoup 的示例：

from BeautifulSoup import BeautifulSoup, SoupStrainer


html_string = '<a> test <a href> <a href = "something" ><a href="anything">'
for link in BeautifulSoup(html_string, parseOnlyThese=SoupStrainer('a')):
    print link.get('href')

score 0 · Accepted Answer

我建议你不要使用正则表达式来解析 HTML（因为有BeautifulSoup）
既然你说你不是，这里有一些东西：

>>> regex = re.compile("(<\s*a\s*href\s*=\s*\"something\"\s*>)+")
# Run findall
>>> regex.findall(string)
[u'<a href = "something" >'] # your tag

python - 将正则表达式字符串与引号和 href 匹配

2 回答 2

Related

Reference