使用 BeautifulSoup 从页面中抓取数据,暂时保存在 sqlite3 表中,然后使用 pandas 处理 sql 的能力将其从 sqlite3 获取到 pandas。
>>> import requests
>>> page = requests.get('http://www.valoreazioni.com/indici/ftse-mib_ftsemib_mi').content
>>> import bs4
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> maidMoneyTable = soup.find_all(id='maidMoneyTable')
>>> table_rows = maidMoneyTable.findAll('li', attrs={'class': 'order'})
>>> for row in table_rows:
... link = row.find('a')
... data = [link.attrs['href']] + [_.text for _ in link.findAll('li')]
... result = c.execute('''INSERT INTO market VALUES (?,?,?,?,?,?,?)''', data)
...
>>> df = pd.read_sql_query('SELECT * FROM market', conn)
>>> df.head()
url symbol \
0 http://www.valoreazioni.com/titoli/a2a-a2a-mi A2A.MI
1 http://www.valoreazioni.com/titoli/anima-holdi... ANIM.MI
2 http://www.valoreazioni.com/titoli/atlantia-at... ATL.MI
3 http://www.valoreazioni.com/titoli/azimut-hold... AZM.MI
4 http://www.valoreazioni.com/titoli/banca-medio... BMED.MI
name item_1 item_2 item_3 item_4
0 A2A SpA 1.50 1.503 0.003 +0.200%
1 ANIMA HOLDING SPA 6.26 6.210 -0.040 -0.64%
2 ATLANTIA 25.96 25.640 -0.240 -0.93%
3 AZIMUT HOLDING 17.94 17.930 0.060 +0.34%
4 BANCA MEDIOLANUM 7.43 7.290 -0.150 -2.02%