2

我正在尝试从 coinmarketcap.com 收集市值数据。事实上,我成功地获得了市值前 10 的硬币,但在前 10 之后它就不起作用了(结果变为无)。

这是我的代码,我使用了 Chrome。

    import requests        
    import time
    from bs4 import BeautifulSoup

    url = 'https://coinmarketcap.com/'
    strhtml = requests.get(url)
    soup = BeautifulSoup(strhtml.text, 'lxml')

    result={}
    baseAddr1 = '#__next > div.bywovg-1.sXmSU > div.main-content > div.sc-57oli2-0.comDep.cmc- 
    body-wrapper > div > div:nth-child(1) > div.h7vnx2-1.bFzXgL > table > tbody > '  //head of selector
    
    baseAddr3 = ' > td:nth-child(3) > div > a'  // end of selector

    for i in range(20):
        i+=1
        while i%10 == 0:
            time.sleep(3)
            print('resting...')
            break

        baseAddr2 = 'tr:nth-child(' + str(i) + ')'  // middle of selector, i for the order of coin
        Addr = baseAddr1 + baseAddr2 + baseAddr3  // full selector
        #print(Addr)

        data = soup.select(Addr)
        for item in data:
            result.update({item.get_text(): item.get('href')})

    print(result)

谢谢你的帮助!

4

1 回答 1

1

当您向下滚动页面时,该站点首先显示然后隐藏每一行硬币数据。要触发此行为并在滚动出现时抓取每一行,您可以使用selenium. 为了速度,下面的答案使用了一小部分 Javascript,通过 运行selenium,来提取结果:

from selenium import webdriver
from bs4 import BeautifulSoup as soup
import pandas as pd
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://coinmarketcap.com/')
results = d.execute_script('''
    window.scrollTo(0,document.body.scrollHeight)
    function* get_coin_data(){
        var h = Array.from(document.querySelectorAll('table.h7vnx2-2.bFpGkc.cmc-table thead th'))
        var hds = h.slice(1, h.length-2).map(x => x.textContent)
        for (var i of document.querySelectorAll('table.h7vnx2-2.bFpGkc.cmc-table tbody tr')){
             var n_hds = JSON.parse(JSON.stringify(hds))
             i.scrollIntoView()
             var tds = Array.from(i.querySelectorAll('td'))
             yield Object.fromEntries(tds.slice(1, tds.length-2).map(function(x){
                  return [n_hds.shift(), x.querySelector(':is(.etpvrL, .iworPT, .cLgOOr, .kAXKAX, .hzgCfk, .hykWbK, .kZlTnE)').textContent]
             }));
         }
    }
    return [...get_coin_data()]
''')
df = pd.DataFrame(results)

输出:

      #  24h %    7d %  ...          Name       Price      Volume(24h)
0     1  1.03%   1.05%  ...       Bitcoin  $48,678.16  $29,904,091,891
1     2  0.25%   1.20%  ...      Ethereum   $3,236.58  $15,197,663,099
2     3  0.86%  15.01%  ...       Cardano       $2.82   $6,389,958,677
3     4  1.94%   6.72%  ...  Binance Coin     $483.64   $1,850,753,287
4     5  0.03%   0.04%  ...        Tether       $1.00  $65,270,928,498
..  ...    ...     ...  ...           ...         ...              ...
95   96  2.08%   7.45%  ...      DigiByte    $0.06528      $24,887,122
96   97  2.33%  10.56%  ...       Horizen      $83.24      $57,256,134
97   98  0.06%   0.03%  ...    Pax Dollar     $0.9996      $86,915,502
98   99  0.02%   1.35%  ...      Ontology       $1.07     $123,632,824
99  100  1.34%   0.57%  ...          ICON       $1.40      $56,657,155

[100 rows x 8 columns]
于 2021-08-28T16:02:19.777 回答