0

我是 bs4 的新手,我期待提取价格表。

我面临的主要问题是,在 html 页面中,表格元素并没有出现,但它是一个div. 我试过看classid但我无法获得价格。

这是我尝试过的:

url = "http://www.valoreazioni.com/indici/ftse-mib_ftsemib_mi"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,"html5lib")

以下是我为获取价格表而应用的过滤器,但未成功

# table=soup.find('div',{'id':'maidMoneyTable'})
# table=soup.find(id='maidMoneyTable')

route=pd.read_html(str(tables),flavor='html5lib')

print(route)

在这两种情况下,回报都是no tables were found

谁能告诉我如何获得所需的表格?

4

1 回答 1

0

使用 BeautifulSoup 从页面中抓取数据,暂时保存在 sqlite3 表中,然后使用 pandas 处理 sql 的能力将其从 sqlite3 获取到 pandas。

>>> import requests
>>> page = requests.get('http://www.valoreazioni.com/indici/ftse-mib_ftsemib_mi').content
>>> import bs4
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> maidMoneyTable = soup.find_all(id='maidMoneyTable')
>>> table_rows = maidMoneyTable.findAll('li', attrs={'class': 'order'})
>>> for row in table_rows:
...     link = row.find('a')
...     data = [link.attrs['href']] + [_.text for _ in link.findAll('li')]
...     result = c.execute('''INSERT INTO market VALUES (?,?,?,?,?,?,?)''', data)
... 
>>> df = pd.read_sql_query('SELECT * FROM market', conn)
>>> df.head()
                                                 url   symbol  \
0      http://www.valoreazioni.com/titoli/a2a-a2a-mi   A2A.MI   
1  http://www.valoreazioni.com/titoli/anima-holdi...  ANIM.MI   
2  http://www.valoreazioni.com/titoli/atlantia-at...   ATL.MI   
3  http://www.valoreazioni.com/titoli/azimut-hold...   AZM.MI   
4  http://www.valoreazioni.com/titoli/banca-medio...  BMED.MI   

                name  item_1  item_2  item_3   item_4  
0            A2A SpA    1.50   1.503   0.003  +0.200%  
1  ANIMA HOLDING SPA    6.26   6.210  -0.040   -0.64%  
2           ATLANTIA   25.96  25.640  -0.240   -0.93%  
3     AZIMUT HOLDING   17.94  17.930   0.060   +0.34%  
4   BANCA MEDIOLANUM    7.43   7.290  -0.150   -2.02%  
于 2017-06-20T22:25:16.533 回答