2

我正在尝试使用以下代码来抓取javascript 网站。在网站打开时的浏览器中,它会加载我想要进入的内容<div class=results>。我的问题是,在 jupyter 笔记本中运行此代码时,此 div 似乎是空的。

from requests_html import AsyncHTMLSession
session = AsyncHTMLSession()
r = await session.get('https://thumbtack.github.io/abba/demo/abba.html#Baseline=13855%2C91428&Variation+1=13242%2C90703&abba%3AintervalConfidenceLevel=0.95&abba%3AuseMultipleTestCorrection=true')
await r.html.arender() 
print(r.content) 

#test of code

这解决了空的div:

<div class="results"><div>

对于上面提供的 url,在浏览器中,使用检查相同的 url 相同的 div 解析为具有内容的 div。

<div class="results" style="display: block;"> <table>     <thead>         <tr>             <th></th>             <th>Successes</th>             <th>Total</th>             <th colspan="2">Success Rate</th>             <th>p-value</th>             <th>Improvement</th>         </tr>     </thead>     <tbody class="result-table">      <tr class="result-row">   <th class="bucket-name">Baseline</th>   <td class="yes">15,453</td>   <td class="total">101,901</td>   <td class="conversion-numeric"> <div class="interval">   <div class="error"><span class="lower-bound">15%</span> – <span class="upper-bound">15%</span></div>   <span class="base">(<span class="base-value">15%</span>)</span> </div></td>   <td class="conversion-visual" style="display: inline-block;"><svg font-size="10px" font-family="sans-serif" fill="none" stroke="none" stroke-width="1.5" width="220" height="15" data-ol-has-click-handler=""><g transform="translate(10,0)"><rect x="114.94823723343774" width="85.05176276656226" height="15" fill="rgb(184,184,184)"></rect><rect x="114.94823723343774" width="1e-10" height="15" fill="rgb(180,38,71)"></rect><rect x="200" width="1e-10" height="15" fill="rgb(38,180,60)"></rect></g><g transform="translate(10,0)"><line shape-rendering="crispEdges" x1="-1" y1="4" x2="-1" y2="11" stroke="rgb(68,68,68)" stroke-width="1"></line><line shape-rendering="crispEdges" x1="200" y1="4" x2="200" y2="11" stroke="rgb(68,68,68)" stroke-width="1"></line></g><g transform="translate(10,0)"><text pointer-events="none" x="-3" dy=".35em" transform="translate(-1,7.5)" fill="rgb(68,68,68)" text-anchor="end" style="font: 12px Arial;">-</text><text pointer-events="none" x="3" dy=".35em" transform="translate(200,7.5)" fill="rgb(68,68,68)" style="font: 12px Arial;">+</text></g><g transform="translate(10,0)"><line shape-rendering="crispEdges" x1="0" y1="7.5" x2="200" y2="7.5" stroke="rgb(68,68,68)" stroke-width="1"></line></g></svg></td>   <td class="p-value">—&lt;/td>   <td class="improvement">—&lt;/td> </tr> <tr class="result-row">   <th class="bucket-name">Variation 1</th>   <td class="yes">14,690</td>   <td class="total">100,845</td>   <td class="conversion-numeric"> <div class="interval">   <div class="error"><span class="lower-bound">14%</span> – <span class="upper-bound">15%</span></div>   <span class="base">(<span class="base-value">15%</span>)</span> </div></td>   <td class="conversion-visual" style="display: inline-block;"><svg font-size="10px" font-family="sans-serif" fill="none" stroke="none" stroke-width="1.5" width="220" height="15" data-ol-has-click-handler=""><g transform="translate(10,0)"><rect width="84.08874241310063" height="15" fill="rgb(184,184,184)"></rect><rect width="84.08874241310063" height="15" fill="rgb(180,38,71)"></rect><rect x="200" width="1e-10" height="15" fill="rgb(38,180,60)"></rect></g><g transform="translate(10,0)"><line shape-rendering="crispEdges" x1="-1" y1="4" x2="-1" y2="11" stroke="rgb(68,68,68)" stroke-width="1"></line><line shape-rendering="crispEdges" x1="200" y1="4" x2="200" y2="11" stroke="rgb(68,68,68)" stroke-width="1"></line></g><g transform="translate(10,0)"><text pointer-events="none" x="-3" dy=".35em" transform="translate(-1,7.5)" fill="rgb(68,68,68)" text-anchor="end" style="font: 12px Arial;">-</text><text pointer-events="none" x="3" dy=".35em" transform="translate(200,7.5)" fill="rgb(68,68,68)" style="font: 12px Arial;">+</text></g><g transform="translate(10,0)"><line shape-rendering="crispEdges" x1="0" y1="7.5" x2="200" y2="7.5" stroke="rgb(68,68,68)" stroke-width="1"></line></g></svg></td>   <td class="p-value">0.0002</td>   <td class="improvement"> <div class="interval">   <div class="error"><span class="lower-bound">-6%</span> – <span class="upper-bound">-1.9%</span></div>   <span class="base">(<span class="base-value">-3.9%</span>)</span> </div></td> </tr></tbody> </table></div>

编辑:根据 requests_html 库文档,这可以呈现 JavaScript。 https://pypi.org/project/requests-html/

4

1 回答 1

0

添加睡眠周期,以便浏览器有时间获取资源并呈现页面。如果仍然出现错误,请调整参数:

r.html.render(超时=15,睡眠=10)

如果将超时设置为 0,它将永远等待页面加载。

于 2021-02-07T03:33:35.963 回答