这是执行此操作的方法。首先下载页面,抓取它以找到您正在寻找的模型,然后您可以获得指向要抓取的新页面的链接。这里不需要javascript。该模型和 BeautifulSoup 文档将助您一臂之力。
from BeautifulSoup import BeautifulSoup
import urllib2
base_url = 'http://www.ksl.com'
url = base_url + '/index.php?nid=443'
model = "Honda" # this is the name of the model to look for
# Load the page and process with BeautifulSoup
handle = urllib2.urlopen(url)
html = handle.read()
soup = BeautifulSoup(html)
# Collect all the ad detail boxes from the page
divs = soup.findAll(attrs={"class" : "detailBox"})
# For each ad, get the title
# if it contains the word "Honda", get the link
for div in divs:
title = div.find(attrs={"class" : "adTitle"}).text
if model in title:
link = div.find(attrs={"class" : "listlink"})["href"]
link = base_url + link
# Now you have a link that you can download and scrape
print title, link
else:
print "No match: ", title
在回答时,此代码片段正在寻找本田车型并返回以下内容:
1995- Honda Prelude http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817797
No match: 1994- Ford Escort
No match: 2006- Land Rover Range Rover Sport
No match: 2006- Nissan Maxima
No match: 1957- Volvo 544
No match: 1996- Subaru Legacy
No match: 2005- Mazda Mazda6
No match: 1995- Chevrolet Monte Carlo
2002- Honda Accord http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817784
No match: 2004- Chevrolet Suburban (Chevrolet)
1998- Honda Civic http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817779
No match: 2004- Nissan Titan
2001- Honda Accord http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817770
No match: 1999- GMC Yukon
No match: 2007- Toyota Tacoma