Python 菜鸟在这里。我正在尝试提取一个链接,特别是亚马逊产品页面上“所有评论”的链接。我得到了意想不到的结果。
import urllib2
req = urllib2.Request('http://www.amazon.com/Ole-Henriksen-Truth-Collagen- Booster/dp/B000A0ADT8/ref=sr_1_1?s=hpc&ie=UTF8&qid=1342922857&sr=1-1&keywords=truth')
response = urllib2.urlopen(req)
page = response.read()
start = page.find("all reviews")
link_start = page.find("href=", start) + 6
link_end = page.find('"', link_start)
print page[link_start:link_end]
相反,它输出: http ://www.amazon.com/Ole-Henriksen-Truth-Collagen-Booster/product-reviews/B000A0ADT8