问题 1:使用 pyppeteer 进行抓取是否正确,我不会(被网站)注意到进行抓取?
简单的回答:是的。这个网站使用 javascript,所以你需要一个类似 pyppeteer 的东西来呈现网页。使用 pyppeteer 也会像普通用户一样进行模拟。所以被发现的机会更小。
技术回答:这需要更多的网络抓取经验,但如果您查看正在调用的请求。该网站使用 API 来呈现数据。因此,使用适当的方法和标头向 API 发出请求以避免被检测到会更有效。
GET https://api.quickfs.net/stocks/BABA:US/ovr/Annual/
{"datasets":{"metadata":{"_id":{},"qfs_symbol":"NYSE:BABA","currency":"USD","fsCat":"normal","name":"Alibaba Group Holding Limited","gs3_version_at_metadata_update":20191106,"exchange":"NYSE","industry":"Retailing","symbol":"BABA","country":"US","price":215.7,"p_pretax_inc":"24.9","ps":"8.1","ev_ebit":"42.5","ev_fcf":"21.7","ev_s":"7.7","ev_ebitda":"37.2","pb":"4.7","mkt_cap":588375,"pe":"27.7","ev_pretax_inc":"23.6","ev":558430,"qfs_symbol_v2":"BABA:US","description":"","avg_vol_50d":19498671,"beta":1.8212,"betaLastUpdated":20200419,"share_turnover":"180","sector":"Consumer Discretionary","template_version":4,"gics":"25502020","template_type":"normal"},"ks":"\n\t\t <div class=\"ksTblBg\">\n\t\t <table class=\"ksTbl\">\n\t\t <thead>\n\t\t <tr>\n\t\t <th colspan=\"6\" style=\"text-align:center\">Key Statistics<\/th>\n\t\t <\/tr>\n\t\t <\/thead>\n\t\t <tbody>\n\t\t \n\t\t <tr>\n\t\t <td class=\"ksSectHead\" colspan=\"2\">Valuation Ratios<\/td>\n\t\t <td class=\"ksSectHead\" colspan=\"2\">10-Yr Median Returns<\/td>\n\t\t <td class=\"ksSectHead\" colspan=\"2\">10-Yr Median Margins<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>P\/E<\/td><td class='rt' id='ks-pe'><\/td>\n\t\t <td class='lt'>ROA<\/td><td class='rt'>13.0%<\/td>\n\t\t <td class='lt'>Gross Profit<\/td><td class='rt'>66.7%<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>P\/B<\/td><td class='rt' id='ks-pb'><\/td>\n\t\t <td class='lt'>ROE<\/td><td class='rt'>22.3%<\/td>\n\t\t <td class='lt'>EBIT<\/td><td class='rt'>28.9%<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>P\/S<\/td><td class='rt' id='ks-ps'><\/td>\n\t\t <td class='lt'>ROIC<\/td><td class='rt'>30.4%<\/td>\n\t\t <td class='lt'>Pre-Tax Income<\/td><td class='rt'>35.6%<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/S<\/td><td class='rt' id='ks-ev_s'><\/td>\n\t\t <td class='ksSectHead' colspan='2'>10-Year CAGR<\/td>\n\t\t <td class='lt'>FCF<\/td><td class='rt'>40.8%<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/EBITDA<\/td><td class='rt' id='ks-ev_ebitda'><\/td>\n\t\t <td class='lt'>Revenue<\/td><td class='rt'>56.3%<\/td>\n\t\t <td class='ksSectHead' colspan='2'>Capital Structure<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/EBIT<\/td><td class='rt' id='ks-ev_ebit'><\/td>\n\t\t <td class='lt'>Assets<\/td><td class='rt'>58.2%<\/td>\n\t\t <td class='lt'>Assets \/ Equity<\/td><td class='rt'>1.6<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/Pretax<\/td><td class='rt' id='ks-ev_pretax_income'><\/td>\n\t\t <td class='lt'>FCF<\/td><td class='rt'>51.1%<\/td>\n\t\t <td class='lt'>Debt \/ Equity<\/td><td class='rt'>0.3<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/FCF<\/td><td class='rt' id='ks-ev_fcf'><\/td>\n\t\t <td class='lt'>EPS<\/td><td class='rt'>68.6%<\/td>\n\t\t <td class='lt'>Debt \/ Assets<\/td><td class='rt'>0.2<\/td>\n\t\t <\/tr>\n\t\t \n\t\t <\/tbody>\n\t\t <\/table>\n\t\t <\/div>","ovr":"<table class='fs-table' id='ovr-table'>\n <tbody>\n <tr class='thead'><td><\/td><td>2011<\/td><td>2012<\/td><td>2013<\/td><td>2014<\/td><td>2015<\/td><td>2016<\/td><td>2017<\/td><td>2018<\/td><td>2019<\/td><td>2020<\/td><\/tr><tr class=' '><td class='labelCell'>Revenue<\/td><td class='dataCell' data-type='normal' data-value='1010821000'>1,011<\/td><td class='dataCell' data-type='normal' data-value='3172277000'>3,172<\/td><td class='dataCell' data-type='normal' data-value='5553464000'>5,553<\/td><td class='dataCell' data-type='normal' data-value='8505565000'>8,506<\/td><td class='dataCell' data-type='normal' data-value='12214920000'>12,215<\/td><td class='dataCell' data-type='normal' data-value='15554001000'>15,554<\/td><td class='dataCell' data-type='normal' data-value='22958079000'>22,958<\/td><td class='dataCell' data-type='normal' data-value='39615348000'>39,615<\/td><td class='dataCell' data-type='normal' data-value='56145652000'>56,146<\/td><td class='dataCell' data-type='normal' data-value='72603233000'>72,603<\/td><\/tr><tr class=' '><td class='labelCell italic indent'>Revenue Growth<\/td><td class='dataCell italic' data-type='percentage' data-value='0.20945600737049'>20.9%<\/td><td class='dataCell italic' data-type='percentage' data-value='2.1383172688339'>213.8%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.75062392092494'>75.1%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.53157830860162'>53.2%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.43610918263513'>43.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.27336085705023'>27.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.47602401465706'>47.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.72555151500263'>72.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.41727019537983'>41.7%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.29312298305842'>29.3%<\/td><\/tr><tr class=' '><td class='labelCell'>Gross Profit<\/td><td class='dataCell' data-type='normal' data-value='812343000'>812<\/td><td class='dataCell' data-type='normal' data-value='2134020000'>2,134<\/td><td class='dataCell' data-type='normal' data-value='3989767000'>3,990<\/td><td class='dataCell' data-type='normal' data-value='6339808000'>6,340<\/td><td class='dataCell' data-type='normal' data-value='8394512000'>8,395<\/td><td class='dataCell' data-type='normal' data-value='10270811000'>10,271<\/td><td class='dataCell' data-type='normal' data-value='14329852000'>14,330<\/td><td class='dataCell' data-type='normal' data-value='22671036000'>22,671<\/td><td class='dataCell' data-type='normal' data-value='25315484000'>25,315<\/td><td class='dataCell' data-type='normal' data-value='32382879000'>32,383<\/td><\/tr><tr class=' '><td class='labelCell italic indent'>Gross Margin %<\/td><td class='dataCell italic' data-type='percentage' data-value='0.80364673864116'>80.4%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.67270922432057'>67.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.71842853397447'>71.8%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.74537176542652'>74.5%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.68723430034744'>68.7%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.66033241221985'>66.0%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.62417469684637'>62.4%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.57227910758224'>57.2%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.45088948294696'>45.1%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.44602530303299'>44.6%<\/td><\/tr><tr class=' '><td class='labelCell'>Operating Profit<\/td><td class='dataCell' data-type='normal' data-value='266271000'>266<\/td><td class='dataCell' data-type='normal' data-value='847525000'>848<\/td><td class='dataCell' data-type='normal' data-value='1820317000'>1,820<\/td><td class='dataCell' data-type='normal' data-value='4084952000'>4,085<\/td><td class='dataCell' data-type='normal' data-value='3736415000'>3,736<\/td><td class='dataCell' data-type='normal' data-value='4607009000'>4,607<\/td><td class='dataCell' data-type='normal' data-value='7035973000'>7,036<\/td><td class='dataCell' data-type='normal' data-value='11137968000'>11,138<\/td><td class='dataCell' data-type='normal' data-value='8604121000'>8,604<\/td><td class='dataCell' data-type='normal' data-value='13105334000'>13,105<\/td><\/tr><tr class=' '><td class='labelCell italic indent'>Operating Margin %<\/td><td class='dataCell italic' data-type='percentage' data-value='0.26342052648293'>26.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.267166139653'>26.7%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.32778046278863'>32.8%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.48026815384986'>48.0%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.30588943685264'>30.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.29619446469111'>29.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.30647045861285'>30.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.28115285015293'>28.1%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.15324643482633'>15.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.18050620417964'>18.1%<\/td><\/tr><tr class=' '><td class='labelCell'>Earnings Per Share<\/td><td class='dataCell' data-type='eps' data-value='0.053'>$0.05<\/td><td class='dataCell' data-type='eps' data-value='0.287'>$0.29<\/td><td class='dataCell' data-type='eps' data-value='0.574'>$0.57<\/td><td class='dataCell' data-type='eps' data-value='1.62'>$1.62<\/td><td class='dataCell' data-type='eps' data-value='1.555'>$1.56<\/td><td class='dataCell' data-type='eps' data-value='4.289'>$4.29<\/td><td class='dataCell' data-type='eps' data-value='2.462'>$2.46<\/td><td class='dataCell' data-type='eps' data-value='3.88'>$3.88<\/td><td class='dataCell' data-type='eps' data-value='4.973'>$4.97<\/td><td class='dataCell' data-type='eps' data-value='7.965'>$7.97<\/td><\/tr><tr class=' '><td class='labelCell italic indent'>EPS Growth<\/td><td class='dataCell italic' data-type='percentage' data-value='0.23255813953488'>23.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='4.4150943396226'>441.5%<\/td><td class='dataCell italic' data-type='percentage' data-value='1'>100.0%<\/td><td class='dataCell italic' data-type='percentage' data-value='1.8222996515679'>182.2%<\/td><td class='dataCell italic' data-type='percentage' data-value='-0.040123456790124'>-4.0%<\/td><td class='dataCell italic' data-type='percentage' data-value='1.7581993569132'>175.8%<\/td><td class='dataCell italic' data-type='percentage' data-value='-0.42597342037771'>-42.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.57595450852965'>57.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.28170103092784'>28.2%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.60164890408204'>60.2%<\/td><\/tr><tr class=' '><td class='labelCell'>Return on Assets<\/td><td class='dataCell' data-type='percentage' data-value='0.12490081137912'>12.5%<\/td><td class='dataCell' data-type='percentage' data-value='0.13547055438638'>13.5%<\/td><td class='dataCell' data-type='percentage' data-value='0.15474766176667'>15.5%<\/td><td class='dataCell' data-type='percentage' data-value='0.26661125490522'>26.7%<\/td><td class='dataCell' data-type='percentage' data-value='0.13179228026259'>13.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.22667998521059'>22.7%<\/td><td class='dataCell' data-type='percentage' data-value='0.097819038194011'>9.8%<\/td><td class='dataCell' data-type='percentage' data-value='0.10848994208585'>10.8%<\/td><td class='dataCell' data-type='percentage' data-value='0.10177987302833'>10.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.12868657986734'>12.9%<\/td><\/tr><tr class=' '><td class='labelCell'>Return on Equity<\/td><td class='dataCell' data-type='percentage' data-value='0.26227533616942'>26.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.20200237445123'>20.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.38077278024081'>38.1%<\/td><td class='dataCell' data-type='percentage' data-value='0.90392646328004'>90.4%<\/td><td class='dataCell' data-type='percentage' data-value='0.24438521190488'>24.4%<\/td><td class='dataCell' data-type='percentage' data-value='0.34553804941983'>34.6%<\/td><td class='dataCell' data-type='percentage' data-value='0.14914185483796'>14.9%<\/td><td class='dataCell' data-type='percentage' data-value='0.17542701201745'>17.5%<\/td><td class='dataCell' data-type='percentage' data-value='0.16392435911507'>16.4%<\/td><td class='dataCell' data-type='percentage' data-value='0.19830371476362'>19.8%<\/td><\/tr><tr class=' '><td class='labelCell'>Return on Invested Capital<\/td><td class='dataCell' data-type='percentage' data-value='0.41743100812616'>41.7%<\/td><td class='dataCell' data-type='percentage' data-value='0.31146385668929'>31.1%<\/td><td class='dataCell' data-type='percentage' data-value='0.56166392937543'>56.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.79357545168436'>79.4%<\/td><td class='dataCell' data-type='percentage' data-value='0.29563665163366'>29.6%<\/td><td class='dataCell' data-type='percentage' data-value='0.40666624726852'>40.7%<\/td><td class='dataCell' data-type='percentage' data-value='0.15645567128128'>15.6%<\/td><td class='dataCell' data-type='percentage' data-value='0.17835726067885'>17.8%<\/td><td class='dataCell' data-type='percentage' data-value='0.15560704472355'>15.6%<\/td><td class='dataCell' data-type='percentage' data-value='0.20185124127701'>20.2%<\/td><\/tr><\/tbody><\/table>","chart":[["2006-12",0],["2007-12",-2.7333985391131],["2008-12",1.4806594382205],["2009-12",0.44823138109063],["2010-12",0.57515717254689],["2011-12",0.41743100812616],["2012-03",0.31146385668929],["2013-03",0.56166392937543],["2014-03",0.79357545168436],["2015-03",0.29563665163366],["2016-03",0.40666624726852],["2017-03",0.15645567128128],["2018-03",0.17835726067885],["2019-03",0.15560704472355],["2020-03",0.20185124127701]]},"errors":[],"code":0,"qfs_symbol_v2":"BABA:US","statementPeriod":"Annual"}
问题 2:我应该以某种方式模拟使用 pyppeteer 选择的关键比率吗?
简单回答:pyppeteer 使用 css 选择器来选择页面上的元素。要选择该下拉菜单,您需要找到将为您提供该元素的选择器路径。您可以使用 Chrome DevTools (F12) 之类的工具右键单击元素并复制 css 选择器。然后用 pypeteer 调用下拉菜单:
# select the button for Key Ratios
await page.select('body > app-root > app-company > div > div > div.pageHead > div > div:nth-child(3) > div.col-xs-offset-3.col-xs-2 > select-fs-dropdown > div > button > div')
您应该能够阅读pyppeteer 的文档,以更好地了解如何实际执行此操作。
简短答案:您可以使用类似于问题 2 答案的选择器来获取表格。然后解析表格。
技术回答:更好地了解网站的工作原理。您可以对网站进行反向工程以了解其工作原理。使用 Chrome DevTools 之类的工具,您可以看到它调用了一个 API。该 API 以易于解析的 JSON 格式返回您需要的所有数据。使用 API 很简单。只需更改股票代码。
# get data for Alibaba
https://api.quickfs.net/stocks/BABA:US/ovr/Annual/
# get data for Tesla
https://api.quickfs.net/stocks/TSLA:US/ovr/Annual/
# get data for Apple
https://api.quickfs.net/stocks/AAPL:US/ovr/Annual/
然后,您可以简单地在 Python 中使用请求调用 API:
import requests
resp = requests.get("https://api.quickfs.net/stocks/AAPL:US/ovr/Annual/")
data = resp.json