python - 如何使用正则表达式解析 HTML 中的数字？

Question

我想用 Python 编写一个简单的正则表达式，从 HTML 中提取一个数字。HTML 示例如下：

Your number is <b>123</b>

现在，如何提取“123”，即字符串“Your number is”之后的第一个粗体文本的内容？

score 63 · Accepted Answer

import re
m = re.search("Your number is <b>(\d+)</b>",
      "xxx Your number is <b>123</b>  fdjsk")
if m:
    print m.groups()[0]

score 24 · Accepted Answer

鉴于s = "Your number is <b>123</b>"当时：

 import re 
 m = re.search(r"\d+", s)

会工作并给你

 m.group()
'123'

正则表达式在您的字符串中查找 1 个或多个连续数字。

请注意，在这种特定情况下，我们知道会有一个数字序列，否则您必须测试的返回值re.search()以确保它m包含有效的引用，否则m.group()会导致AttributeError:异常。

当然，如果您要处理大量 HTML，您需要认真研究一下BeautifulSoup——它的用途不止于此。BeautifulSoup 的整个想法是避免使用字符串操作或正则表达式进行“手动”解析。

score 11 · Accepted Answer

import re
x = 'Your number is <b>123</b>'
re.search('(?<=Your number is )<b>(\d+)</b>',x).group(0)

这将搜索 'Your number is' 字符串后面的数字

score 5 · Accepted Answer

5

import re
print re.search(r'(\d+)', 'Your number is <b>123</b>').group(0)

于 2014-02-17T19:20:11.170 回答

score 4 · Accepted Answer

4

最简单的方法就是提取数字（数字）

re.search(r"\d+",text)

于 2016-06-22T10:45:26.773 回答

score 2 · Accepted Answer

val="Your number is <b>123</b>"

选项1

m=re.search(r'(<.*?>)(\d+)(<.*?>)',val)

m.group(2)

选项：2

re.sub(r'([\s\S]+)(<.*?>)(\d+)(<.*?>)',r'\3',val)

score 2 · Accepted Answer

import re
found = re.search("your number is <b>(\d+)</b>", "something.... Your number is <b>123</b> something...")

if found:
    print found.group()[0]

这里 (\d+) 是分组，因为只有一个组[0]被使用。当有多个分组[grouping index]时应使用。

score 1 · Accepted Answer

要提取为 python 列表，您可以使用findall

>>> import re
>>> string = 'Your number is <b>123</b>'
>>> pattern = '\d+'
>>> re.findall(pattern,string)
['123']
>>>

score 0 · Accepted Answer

您可以使用以下示例来解决您的问题：

import re

search = re.search(r"\d+",text).group(0) #returns the number that is matched in the text

print("Starting Index Of Digit", search.start())

print("Ending Index Of Digit:", search.end())

score 0 · Accepted Answer

import re
x = 'Your number is <b>123</b>'
output = re.search('(?<=Your number is )<b>(\d+)</b>',x).group(1)
print(output)

python - 如何使用正则表达式解析 HTML 中的数字？

10 回答 10

选项1

选项：2

Related

Reference