我试图从 Zillow 收集数据没有成功。
例子:
url = https://www.zillow.com/homes/for_sale/Los-Angeles-CA_rb/?fromHomePage=true&shouldFireSellPageImplicitClaimGA=false&fromHomePageTab=buy
我想从洛杉矶的所有家庭中提取地址、价格、估价、位置等信息。
我尝试过使用 BeautifulSoup 之类的包进行 HTML 抓取。我也尝试过使用json。我几乎可以肯定 Zillow 的 API 不会有帮助。据我了解,API 最适合收集特定属性的信息。
我已经能够从其他网站上抓取信息,但似乎 Zillow 使用动态 ID(每次刷新都会更改)使得访问该信息变得更加困难。
更新: 尝试使用下面的代码,但仍然没有产生任何结果
import requests
from bs4 import BeautifulSoup
url = 'https://www.zillow.com/homes/for_sale/Los-Angeles-CA_rb/?fromHomePage=true&shouldFireSellPageImplicitClaimGA=false&fromHomePageTab=buy'
page = requests.get(url)
data = page.content
soup = BeautifulSoup(data, 'html.parser')
for li in soup.find_all('div', {'class': 'zsg-photo-card-caption'}):
try:
#There is sponsored links in the list. You might need to take care
#of that
#Better check for null values which we are not doing in here
print(li.find('span', {'class': 'zsg-photo-card-price'}).text)
print(li.find('span', {'class': 'zsg-photo-card-info'}).text)
print(li.find('span', {'class': 'zsg-photo-card-address'}).text)
print(li.find('span', {'class': 'zsg-photo-card-broker-name'}).text)
except :
print('An error occured')