0

python 代码在page = webclient.getPage("https://www.gartner.com/en/newsroom").

我从http://blog.databigbang.com/web-scraping-ajax-and-javascript-sites/修改了 gartner.py 脚本,只是为了快速将 jython 代码调整为当前网站版本(请参阅测试时有效的 xpath在硒中):

from com.gargoylesoftware.htmlunit import WebClient as WebClient
from com.gargoylesoftware.htmlunit import BrowserVersion as BrowserVersion

def main():
   webclient = WebClient(BrowserVersion.BEST_SUPPORTED) # creating a new webclient object.

   page = webclient.getPage("https://www.gartner.com/en/newsroom") # getting the url
   articles = page.getByXPath("//div[@class='row newsletter']//a") # getting all the hyperlinks

   for article in articles:
      print ("Clicking on:", article)
      subpage = article.click() # click on the article link
      title = subpage.getByXPath("//div[@class='globalsite cmp-globalsite-columncontrol aem-GridColumn aem-GridColumn--default--12']//*[@class='grid-norm  mg-t0']") # get title
      summary = subpage.getByXPath("//div[@class='globalsite cmp-globalsite-columncontrol aem-GridColumn aem-GridColumn--default--12']//*[@class='grid-norm  subtitle mg-t15 mg-b15']") # get summary

      print(title)
      print(summary)

if __name__ == '__main__':
   main()

这就是我得到的:

C:\Users\xyz>jython C:\Users\xyz\Desktop\gartner2.py
com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl WARNING Obsolete content type encountered: 'text/javascript'.

...some CSS errors...

com.gargoylesoftware.htmlunit.javascript.DefaultJavaScriptErrorListener SEVERE Error during JavaScript execution
Traceback (most recent call last):
  File "C:\Users\xyz\Desktop\gartner2.py", line 30, in <module>
    main()
  File "C:\Users\xyz\Desktop\gartner2.py", line 17, in main
    page = webclient.getPage("https://www.gartner.com/en/newsroom") # getting the url
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "then" of undefined (https://www.gartner.com/en/ruxitagentjs_ICA2SVfqru_10137171222133618.js#163)
        at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:891)
     ....
com.gargoylesoftware.htmlunit.ScriptException: com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "then" of undefined (https://www.gartner.com/en/ruxitagentjs_ICA2SVfqru_10137171222133618.js#163)
4

0 回答 0