2

我需要刮掉这个页面: https ://www.arabam.com/ilan/galeriden-satilik-lamborghini-gallardo-lp-560-4/mini-motors-dan-2009-gallardo-lp560-4-seramik-lift -bayi-boyasiz/14934711

如果你向下滚动你会看到这个 在此处输入图像描述

我向下滚动页面,然后为此获取 xpath。这是 xpath://div[@id="js-hook-description"]//p/text

这是代码

const results = xpathT.fromPageSource(data).findElements(rest);
    
    //console.log("The href value is:", results[0].getAttribute("href"));
    console.log(`Your full text is "${results[0].getText()}"`);
    if (results.length > 0) {
      let _results = [];
      if (path.includes("href", 0)){
          
          for (let r of results) {
              
            _results.push(r.getAttribute("href"));
          }
      }
      if (path.includes("text", 0)){
          //console.log("inside");
          //console.log(results);
          for (let r of results) {
             console.log(r.getText());
            _results.push(r.getText());
          }

当我简单地打印结果时,它给了我这个:

Your full text is "<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">LAMBORGHİNİ GALLARDO LP560-4</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">2009 MODEL - 38.000 KM</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">DOĞUŞ OTO <font color="#ff0000">BAYİİ</font> ÇIKIŞLI</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">AİRMATİC (LİFT)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">SERAMİK FREN</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">GERİ GÖRÜŞ KAMERASI</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">PADDLESHİFT (F1)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">2 BÖLGE KLİMA</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">DERİ KOLTUK</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">Bİ-ZENON FAR</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">YAĞMUR SENSÖRÜ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">CD-USB-AUX-MP3</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">?</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">BOYA - HATA - TRAMER - HASAR KAYDI </font><font color="#ff0000"><font size="4"> </font><font size="5">YOKTUR</font></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5">ARACIMIZIN TAMPONLARI DAHİL <font color="#ff0000">BOYASIZ</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">YEDEK ANAHTARI <font color="#ff0000">MEVCUTTUR</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b>?</b></span><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5">DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5" color="#ff0000"><br/></font></b></span></p>,<p style="text-align: cente...

但是当我调用 .getText() 它返回未定义。对此有什么可能的解决方案?

4

1 回答 1

2

您可以使用page.evaluate来获取任何 DOM 元素的 innerText 属性。如果您需要逐段显示文本,您应该为<p>元素使用适当的 CSS 选择器,在这种情况下是:#js-hook-description > div > p. 可以使用page.$$方法收集匹配的元素(与页面上下文中的相同document.querySelectorAll()),然后可以迭代这些元素(参见下面的 afor..ofArray.map变体),在每次迭代innerText中检索 并且还String.trim()应用 a 来清理换行符的段落(例如:)\n

// full text content into one string
  const fullText = await page.evaluate(el => el.innerText, await page.$('#js-hook-description'))
  console.log(fullText)

// each paragraph into an array element I.
  const textArray = []
  const paragraphs = await page.$$('#js-hook-description > div > p')
  for (const p of paragraphs) {
     const actualPara = await page.evaluate(el => el.innerText.trim(), p)
     textArray.push(actualPara)
  }
  console.log(JSON.stringify(textArray))

可以使用page.$$evaland完成替代解决方案Array.map

// each paragraph into an array element II.
  const alternativeSolution = await page.$$eval('#js-hook-description > div > p', paragraphs => paragraphs.map(p => p.innerText.trim()))
  console.log(JSON.stringify(alternativeSolution))

全文输出:

LAMBORGHİNİ GALLARDO LP560-4 2009 MODEL - 38.000 KM DOĞUŞ OTO BAYİİ ÇIKIŞLI AİRMATİC (LİFT) SERAMİK FREN GERİ GÖRÜŞ KAMERASI PADDLESHİFT (F1) 2 BÖLGE KLİMA DERİ KOLTUK Bİ-ZENON FAR YAĞMUR SENSÖRÜ CD-USB-AUX-MP3 ? BOYA - HATA - TRAMER - HASAR KAYDI  YOKTUR ARACIMIZIN TAMPONLARI DAHİL BOYASIZ YEDEK ANAHTARI MEVCUTTUR ? DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ 0533 239 22 77

数组逐行输出:

["LAMBORGHİNİ GALLARDO LP560-4","2009 MODEL - 38.000 KM","DOĞUŞ OTO BAYİİ ÇIKIŞLI","","AİRMATİC (LİFT)","SERAMİK FREN","GERİ GÖRÜŞ KAMERASI","PADDLESHİFT (F1)","2 BÖLGE KLİMA","DERİ KOLTUK","Bİ-ZENON FAR","YAĞMUR SENSÖRÜ","CD-USB-AUX-MP3","","?","BOYA - HATA - TRAMER - HASAR KAYDI  YOKTUR","","ARACIMIZIN TAMPONLARI DAHİL BOYASIZ","","YEDEK ANAHTARI MEVCUTTUR","?","DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ","","0533 239 22 77"]
于 2020-07-21T11:59:14.800 回答