我是 Python 的新手,正在尝试从房地产列表网站 (www.realtor.ca) 上抓取信息。到目前为止,我已经设法使用以下代码在列表中收集 MLS 号码:
import urllib2, sys, re, mechanize, itertools, csv
# Set the url for the online search
url = 'http://www.realtor.ca/PropertyResults.aspx?Page=1&vs=Residential&ret=300&curPage=PropertySearch.aspx&sts=0-0&beds=0-0&baths=0-0&ci=Victoria&pro=3&mp=200000-300000-0&mrt=0-0-4&trt=2&of=1&ps=10&o=A'
content = urllib2.urlopen(url).read()
text = str(content)
# finds all instances of "MLS®: " to create a list of MLS numbers
# "[0-9]+" matches all numbers (the plus means one or more) In this case it's looking for a 6-digit MLS number
findMLS = re.findall("MLS®: [0-9]+", text)
findMLS = [x.strip('MLS®: ') for x in findMLS]
# "Page 1 of " precedes the number of pages in the search result (10 listings per page)
num_pages = re.findall("Page 1 of [0-9]+", text)
num_pages = [y.strip('Page 1 of ') for y in num_pages]
pages = int(num_pages[0])
for page in range(2,pages+1):
# Update the url with the different search page numbers
url_list = list(url)
url_list[48] = str(page)
url = "".join(url_list)
# Read the new url to get more MLS numbers
content = urllib2.urlopen(url).read()
text = str(content)
newMLS = re.findall("MLS®: [0-9]+", text)
newMLS = [x.strip('MLS®: ') for x in newMLS]
# Append new MLS numbers to the list findMLS
for number in newMLS:
findMLS.append(number)
使用我的 MLS 号码列表 (findMLS),我想将每个号码输入到本网站顶部的 MLS# 搜索框中:http ://www.realtor.ca/propertySearch.aspx
使用检查元素我可以找到这个搜索框,但我不知道如何使用 Python 代码和 Mechanize 来访问它。
<input type="text" id="txtMlsNumber" value="" style="background-color:#ebebeb;border:solid 1px #C8CACA; " onkeypress="javascript:MLSNumberSearch(event)">
任何帮助将不胜感激。