我能够从网页中抓取大量数据,但我正在努力从具有完全相同属性和值的小节中提取特定内容。这是html:
<li class="highlight">
Relationship Issues
</li>
<li class="highlight">
Depression
</li>
<li class="highlight">
Spirituality
</li>
<li class="">
ADHD
</li>
<li class="">
Alcohol Use
</li>
<li class="">
Anger Management
</li>
使用该html作为参考,我有以下内容:
import requests
from bs4 import BeautifulSoup
import html5lib
import re
headers = {'User-Agent': 'Mozilla/5.0'}
URL = "website.com"
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html5lib')
specialties = soup.find_all('div', {'class': 'spec-list attributes-top'})
for x in specialties:
Specialty_1 = x.find('li', {'class': 'highlight'}).text
Specialty_2 = x.find('li', {'class': 'highlight'}).text
Specialty_3 = x.find('li', {'class': 'highlight'}).text
所以理想的结果是:Specialty_1 = 关系问题;Specialty_2 = 抑郁症;Specialty_3 = 灵性
和
问题_1 = 多动症;问题_2 = 饮酒;问题_3 = 愤怒管理
将不胜感激任何和所有的帮助!