html - 无法使用 xPath 找到产品信息

Question

我正在用 Python 编写我的第一个网络爬虫，并试图从 Aliexpress 产品页面获取产品标题和价格。如果这是一个明显的问题，我在这个话题上完全是个菜鸟，很抱歉，但到目前为止我从其他帖子中尝试过的解决方案都没有奏效。我正在使用 xpath 来定位 html 元素。我已经使用检查元素从 Chrome 复制了 xpath 代码 -> 复制 xPath 工具。它似乎与在其他网站上的工作方式不同，因为 tree.xpath 调用只是不断返回空列表。我设法通过反复试验使它适用于标题，因为它似乎返回了一个包含整个页面上所有文本的列表，并且标题位于列表的第三个索引上。虽然我找不到价格指数，但我也想找到正确的方法来做到这一点。我' 已经尝试过其他人对类似问题的解决方案，但在我的情况下似乎没有任何效果，我迷路了。这是我的代码：

import requests
from lxml import html


url = 'https://www.aliexpress.com/item/4000203338045.html?spm=a2g0o.detail.1000060.1.77ce75e1YttKZb&gps-id=pcDetailBottomMoreThisSeller&scm=1007.13339.146401.0&scm_id=1007.13339.146401.0&scm-url=1007.13339.146401.0&pvid=662e2a50-e8d2-4ce3-b66e-70afff126070'

page = requests.get(url)
tree = html.fromstring(page.content)

title = tree.xpath('//*[@id="root"]/div/div[1]/div/div[2]/div[1]')[0]
title_text = title.xpath('///text()')[3]

print('Title:',title)
print('Title text:',title_text)

price = tree.xpath('//*[@id="root"]/div/div[2]/div/div[2]/div[4]/div[1]/span')
print('Price:', price)

这是输出：

Title: <Element div at 0x3f113f0>
Title text: Bluedio T elf 2 Bluetooth earphone TWS wireless earbuds waterproof Sports Headset Wireless Earphone in ear with charging box-in Phone Earphones & Headphones from Consumer Electronics on AliExpress 
Price: []

我感谢您的帮助！

score 0 · Accepted Answer

您正在寻找的 xpath 字符串是

tree.xpath('//div[@class="product-title"]/text()')
tree.xpath('//div[@class="product-price-current"]//text()')

但是，requests不处理 javascript（你需要 selenium 或在 scrapy 前面飞溅）。如果您看一下，page.content您会发现您要查找的单词在文档中，但在一些 JSON 中。

"name":"PageModule",
"ogDescription":"Smarter Shopping, Better Living!  Aliexpress.com",
/* TITLE */
"ogTitle":"US $18.27 70% OFF|Bluedio T elf 2 Bluetooth earphone TWS wireless earbuds waterproof Sports Headset Wireless Earphone in ear with charging box-in Phone Earphones & Headphones from Consumer Electronics on AliExpress ",

"ogurl":"//www.aliexpress.com/item/4000203338045.html",
"oldItemDetailUrl":"https://www.aliexpress.com/item/Bluedio-T-elf-2-Bluetooth-earphone-TWS-wireless-earbuds-waterproof-Sports-Headset-Wireless-Earphone-in-ear/4000203338045.html",
"plazaElectronicSeller":false,
"productId":4000203338045,
"ruSelfOperation":false,
"showPlazaHeader":false,
"siteType":"glo",
"spanishPlaza":false,
"title":"Bluedio T elf 2 Bluetooth earphone TWS wireless earbuds waterproof Sports Headset Wireless Earphone in ear with charging box-in Phone Earphones & Headphones from Consumer Electronics on AliExpress "
},
"preSaleModule":{ 
   "features":{ 

   },
   "i18nMap":{ 

   },
   "id":0,
   "name":"PreSaleModule",
   "preSale":false
},
"priceModule":{ 
   "activity":true,
   "bigPreview":false,
   "bigSellProduct":false,
   "discount":70,
   "discountPromotion":true,
   "features":{ 

   },
   /* PRICE */
   "formatedActivityPrice":"US $18.27",

   "formatedPrice":"US $60.90",
   "hiddenBigSalePrice":false,
   "i18nMap":{ 
      "LOT":"lot",
      "INSTALLMENT":"Installment",
      "DEPOSIT":"Deposit",
      "PRE_ORDER_PRICE":"Pre-order price"
   }

不幸的是，我认识到这并不能让您一路找到您正在寻找的答案，但希望这能帮助您顺利上路。

html - 无法使用 xPath 找到产品信息

1 回答 1

Related

Reference