python -
使用 beautifulsoup提取标签之间的数据

Question

我有这个 html 数据，我需要对其进行解析以从中提取数据。但是它有这么多标签，而且数据对我来说也很难导航。从下面的 Html 数据中，我需要创建一个 python 字典列表，如下所示：

[{“学校”：“孩子们玩”}，{“地点”：“纽约”}，{“级别”：“四个”}，{“国家”：“美国”}，{“课程级别”：“简单的”}]

<div class="quick">
 <strong>School</strong><br /> Childs play <br /><br />
 <strong>Place</strong><br />
 <a href="Search.aspx?Menu=new&amp;Me=">newyork</a><br /><br />
 <strong>Level</strong><br />four<br /><br />
 <strong>Country</strong><br />USA<br /><br />
 <strong>Level Of Course</strong><br />Easy<br /><br />
</div>

我尝试使用beautifulsoup，但没有成功。请帮忙

score 1 · Accepted Answer

Unfortunately, the HTML is not ideally constructed for parsing, but it is possible to extract the data into a meaningful Python dictionary.

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(htmlString)

raw_data = soup.find(**{"class": "quick"}).contents
data = [x for x in raw_data if not hasattr(x, "name") or not x.name == "br"]

Using if not hasattr(x, "name") or not x.name == "br" first checks to make sure that the item is an instance of NavigableString and then checks that the element is not a <BR> tag.

data will then be of the format [<KEY>, <VALUE>, <KEY>, <VALUE>] from which it should be fairly trivial to extract the data.

python - 使用 beautifulsoup提取标签之间的数据

1 回答 1

Related

Reference

python -
使用 beautifulsoup提取标签之间的数据