0

尝试使用scrapy.Code从一些html中获取货币值是

links = hxs.select('//a[@class="product-image"]/div[@class="price-box"]//span[@class="price"]/text()').extract()')

和 HTML

<div>
  <span>
    <sub>
      <li class="item first">

        <a href="http://www.xtra-vision.ie/dvd-blu-ray/to-rent/new-release/dvd/pitch-perfect-dvd.html" title="Image for Pitch Perfect" class="product-image">

          <span class="exclusive-star">
          </span>

          <img src="http://www.xtra-vision.ie/media/catalog/product/cache/3/small_image/124x173/5b02ab93946615b958c913185aae2414/i/w/iws_5167c10c906b57.33524324.JPG.jpg"  alt="Image for Pitch Perfect" />

          <h2 class="product-name">Pitch Perfect</h2>

          <div class="price-box">

            <span class="regular-price" id="product-price-5174">

              <span class="price">
                €15                     
                <sub class="price-bit">.99</sub>
              </span>
            </span>
          </div>
        </a>
      </li>
    </sub>

  </span>

</div>

我得到的结果价格是 \u20ac15\t\t\t\t\t\t 有什么方法可以使用 xpath 从这个 html 中提取 15.99

4

1 回答 1

0

我使用了 xpath 和 Python 的组合,因此可能不是您所追求的,尽管这主要用于摆脱添加到“价格”末尾的无关选项卡。

price = hxs.select('//span[@class="price"]/text()').extract()
pricebit = hxs.select('//span[@class="price"]/sub[@class="price-bit"]/text()').extract()
totalprice = price + price-bit
totalstr = ''.join(totalprice).replace('\t','')
于 2013-04-16T11:19:02.970 回答