我正在尝试抓取一个网站,从产品页面我试图取消产品描述,但我如何只选择产品描述:
xPath : hxs.select('//div[@class="product-shop"]/p/text()').extract()
HTML 相当大,请参阅上面指定的链接。
我只想选择产品描述而不是其他详细信息...
如果我这样做:
[" ".join([i.strip() for i in hxs.select('//div[@class="product-shop"]/p/text()').extract()])]
output :
[u'Itemcode: 12BTS28271 Brand: BASICS InStock - Ships within 2 business days. Tip: 90% of our shipments reach within 4 business days! This product is part of the Basics T.shirts line made of 100% Cotton. Stripes Muscle Fit T.shirts that come in Green Color. Casual that comes with Henley away.']
但我只想:
[u'This product is part of the Basics T.shirts line made of 100% Cotton. Stripes Muscle Fit T.shirts that come in Green Color. Casual that comes with Henley away.']