html - BeautifulSoup 匹配错误的类

Question

我正在使用如下所示的 HTML：

<td class="hidden-xs BuildingUnit-price" data-sort-value="625000">
<span class="price">$625,000  </span>
</td>
<td class="hidden-xs BuildingUnit-bedrooms" data-sort-value="4.0">
        4 rooms, 2 beds
      </td>
<td class="hidden-xs BuildingUnit-bathrooms">
        5 baths
      </td>
<td class="hidden-xs" data-sort-value="1">
    1 bath
  </td>

我编写了下面的脚本来识别具有“hidden-xs”类的 td 标签，以便为房地产列表提取浴室数量，但它也与“hidden-xs BuildingUnit-price”类匹配。我该如何纠正？

#Extract the number of baths
import re
lst_baths=list()
baths=soup.find_all("td", class_=["hidden-xs"])  
bath_lines=[td.get_text().strip() for td in baths]
pattern=re.compile(r'(\d{1})\D*(bath|baths)$')
for bath in bath_lines:
    match=pattern.match(bath)
    if match:
        lst_baths.append(bath.split()[0])

例如，正如它目前所写的那样，我的代码选择了“5 个浴室”这一行，但我只希望它选择“1 个浴室”这一行。

score 0 · Accepted Answer

找到了一种方法来测试每个匹配的类：

#Extract the baths
lst_baths=list()
temp_lst=list()
baths=soup.find_all("td", class_=["hidden-xs"])
for item in baths:
    if item['class']==['hidden-xs']:
        temp_lst.append(item)
    else:
        pass
bath_lines=[td.get_text().strip() for td in temp_lst]
pattern=re.compile(r'(\d{1})\D*(bath|baths)$')
for bath in bath_lines:
    match=pattern.match(bath)
    if match:
        lst_baths.append(bath.split()[0])

html - BeautifulSoup 匹配错误的类

1 回答 1

Related

Reference