0

我一直在尝试为一个项目抓取有关房地产新物业的信息数据。

'NoneType' object has no attribute 'get_text'当我尝试从网络获取床位数量时出现错误。我能够获得其他属性,不仅是卧室。

链接:- https://www.99acres.com/search/property/buy/residential-all/ahmedab​​ad -all?search_type=QS&refSection=GNB&search_location=NRI&lstAcn=NR_R&lstAcnId=-1&src=CLUSTER&preference=S&selected_tab=1&city=45&res_com=R&property_type= R&isvoicesearch=N&keyword_suggest=Ahmedab​​ad%20(All)%3B&fullSelectedSuggestions=Ahmedab​​ad%20(All)&strEntityMap=W3sidHlwZSI6ImNpdHkifSx7IjEiOlsiQWhtZWRhYmFkIChBbGwpIiwiQ0lUWV80NSwgUFJFRkVSRU5DRV9TLCBSRVNDT01fUiJdfV0%3D&texttypedtillsuggestion=ahme&refine_results=Y&Refine_Localities=Refine%20Localities&action=%2Fdo%2Fquicksearch%2Fsearch&suggestion=CITY_45%2C%20PREFERENCE_S%2C%20RESCOM_R&searchform= 1&price_min=null&price_max=null

这是我的代码:

titles = []
prices = []
super_area = []
bhk = []
move = []
for post in item:
    title = post.find(id='srp_tuple_property_title').get_text().strip()
    price = post.find(id='srp_tuple_price').get_text().strip()
    area = post.find(id='srp_tuple_primary_area').get_text().strip()
    moves = post.find(class_='badges__secondaryLargeSubtle').get_text().strip()
    bed = post.find(id='srp_tuple_bedroom').get_text().strip()
    bhk.append(bed)
    move.append(moves)
    super_area.append(area)
    prices.append(price)
    titles.append(title)
4

1 回答 1

2

尝试使用底层 API。在过滤后检查实际的 API 调用将为您提供正确的 URL。

import requests

url = "https://www.99acres.com/api-aggregator/project/searchWidget?area_unit=1&platform=DESKTOP&moduleName=GRAILS_SRP&workflow=GRAILS_SRP&city=45&preference=S&res_com=R&page=1&page_size=10&isCrossSell=false"
data = requests.get(url=url).json()
print(data['newProjects'][1]['propTypeStr'])
# 2 BHK Apartment

过滤会改变 URL 的参数,例如:

https://www.99acres.com/api-aggregator/srp/search?bedroom_num=3&budget_min=136&locality_array=6046%2C6038&area_min=1900&area_unit=1&localityNameMap=%5Bobject%20Object%5D&platform=DESKTOP&moduleName=GRAILS_SRP&workflow=GRAILS_SRP&page_size=30&page=1&city=45&preference=S&res_com=R&seoUrlType=DEFAULT

这可以分解为urllib

from urllib import parse

url = "https://www.99acres.com/api-aggregator/srp/search?bedroom_num=3&budget_min=136&locality_array=6046%2C6038&area_min=1900&area_unit=1&localityNameMap=%5Bobject%20Object%5D&platform=DESKTOP&moduleName=GRAILS_SRP&workflow=GRAILS_SRP&page_size=30&page=1&city=45&preference=S&res_com=R&seoUrlType=DEFAULT"
parse.parse_qs(parse.urlparse(url).query)
# {'bedroom_num': ['3'],
#  'budget_min': ['136'],
#  'locality_array': ['6046,6038'],
#  'area_min': ['1900'],
#  'area_unit': ['1'],
#  'localityNameMap': ['[object Object]'],
#  'platform': ['DESKTOP'],
#  'moduleName': ['GRAILS_SRP'],
#  'workflow': ['GRAILS_SRP'],
#  'page_size': ['30'],
#  'page': ['1'],
#  'city': ['45'],
#  'preference': ['S'],
#  'res_com': ['R'],
#  'seoUrlType': ['DEFAULT']}
于 2020-05-19T00:22:10.660 回答