python - 如何在 Python 中使用正则表达式检索值？

Question

我写了这样的代码：

print re.findall(r'(<td width="[0-9]+[%]?" align="(.+)">|<td align="(.+)"> width="[0-9]+[%]?")([ \n\t\r]*)([0-9,]+\.[0-9]+)([ \n\t\r]*)([&]?[a-zA-Z]+[;]?)([ \n\t\r]*)<span class="(.+)">',r.text,re.MULTILINE)

得到这条线：

<td width="47%" align="left">556.348&nbsp;<span class="uccResCde">

我想要值 556.348。如何使用正则表达式获得它？

score 3 · Accepted Answer

HTMLParser 文档中的直接剪切和粘贴将从标签中获取数据，但不使用正则表达式。

from HTMLParser import HTMLParser

# Create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "Encountered a start tag:", tag
    def handle_endtag(self, tag):
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        print "Encountered some data  :", data

# Instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<td width="47%" align="left">556.348&nbsp;<span class="uccResCde">')

score 0 · Accepted Answer

这是一个应该解释如何获得匹配组的解决方案。您应该阅读文档。

import re

text_to_parse= '<td width="47%" align="left">556.348&nbsp;<span class="uccResCde">'
pattern = r'(<td width="[0-9]+[%]?" align="(.+)">|<td align="(.+)"> width="[0-9]+[%]?")([ \n\t\r]*)([0-9,]+\.[0-9]+)([ \n\t\r]*)([&]?[a-zA-Z]+[;]?)([ \n\t\r]*)<span class="(.+)">'
m = re.search(pattern, text_to_parse)
m.group(5)

但是为了解析 HTML，不需要使用正则表达式。相反，使用 HTML 解析器，例如Beautiful Soup：

from bs4 import BeautifulSoup

soup = BeautifulSoup(text_to_parse)
soup.text

python - 如何在 Python 中使用正则表达式检索值？

2 回答 2

Related

Reference