python - 如何将以下内容与python中的正则表达式匹配？

Question

假设我有以下字符串：

string = "** Hunger is the physical sensation of desiring food.                                      

<br>         Your Hunger Level: Very Hungery<br> Food You Crave: Tomato<br/><br/>"

我希望能够提取出“你的饥饿”和“番茄”。假设无论插入什么特殊字符，我都知道“你的饥饿程度：”和“你渴望的食物”将始终保持不变。

"Your Hunger Level:" could be: "Very Hungry", "Hungry", "Not So Hungry"
"Food You Crave:" could be: "Tomato", "Rice and Beans", "Corn Soup"

如何使用正则表达式来匹配它？我尝试了以下方法，但没有任何运气......

m = re.match('(.*)([ \t]+)?Your Hunger Level:([ \t]+)?(?P<hungerlevel>.*)(.*)Food You Crave:([ \t]+)?(?P<foodcraving>.*).*', string)

注意：该字符串似乎有很多转义字符，如下所示：

string = "** Hunger is the physical sensation of desiring food. <br>\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\tYour Hunger Level:
Very Hungry \n\t\t\t\t\t\t\t\t<br>\n\t\t\t\t\t\t\t\tFood You Crave: Tomato \n\t\t\t\t\t\t</br>"

score 3 · Accepted Answer

我会去：

print [map(str.strip, line.split(':')) for line in re.split('<.*?>', string) if ':' in line]
# [['Your Hunger Level', 'Very Hungery'], ['Food You Crave', 'Tomato']]

或者，您可以将其设为dict：

lookup = dict(map(str.strip, line.split(':')) for line in re.split('<.*?>', text) if ':' in line)
print lookup['Your Hunger Level']
# 'Very Hungry'

score 2 · Accepted Answer

我绝对同意使用任何类型的解析器，但以下似乎可行。它只是在你的目标词之后开始，直到它到达 a <（我不支持它作为记录，但希望它有效:)）：

In [28]: import re

In [29]: s = """** Hunger is the physical sensation of desiring food.
<br>         Your Hunger Level: Very Hungery<br> Food You Crave: Tomato<br/><br/>"""

In [31]: m = re.search(r'Your Hunger Level:([^<]*)<br>.*Food You Crave:([^<]*)', s)

In [32]: m.group(1).strip()
Out[32]: 'Very Hungery'

In [33]: m.group(2).strip()
Out[33]: 'Tomato'

是strip()修剪空格 - 不确定字符串的设置是什么，但这是保守的，因此它可以处理冒号和文本之间没有空格的情况。另外，我建议不要使用 Python 关键字作为变量名（string在这种情况下是）——从长远来看，这会让你的事情变得更容易:)

score 0 · Accepted Answer

首先，使用解析器解析 HTML。有很多可供您使用，例如美丽的汤、lxml。
其次，在文档中搜索<br>标签。
第三，在标签文本中搜索您想要的文本，然后返回该标签。

python - 如何将以下内容与python中的正则表达式匹配？

3 回答 3

Related

Reference