python - 使用 Python 3 从 Yahoo Finance 获取价格以外的 Web 抓取信息

Question

我是 python 新手，所以我为任何新手错误道歉。我按照教程从 python 中抓取股票价格，但在修复它以在 python 3 中工作后，当我尝试将其调整到雅虎财经页面的其他元素（例如市盈率和 Beta）时，输出只是空方括号。

import urllib.request
import re

symbolslist = ["aapl","spy","goog","nflx"]

i=0
while i<len(symbolslist):
    url = "http://finance.yahoo.com/q?s=" +symbolslist[i] +"&q1=1"
    htmlfile = urllib.request.urlopen(url)
    htmltext = htmlfile.read()
    regex = b'<th scope="row" width="48%">"P/E "<span class="small">(ttm)</span>:    </th><td class="yfnc_tabledata1">(.+?)</td>'
    pattern = re.compile(regex)
    price_to_earnings = str(re.findall(pattern,htmltext))
    print ("The price to earnings of " + symbolslist[i]+ " is " + price_to_earnings)
    i+=1

这是输出

    The price to earnings of aapl is []
    The price to earnings of spy is []
    The price to earnings of goog is []
    The price to earnings of nflx is []
    >>>

score 0 · Accepted Answer

首先，我建议您使用BeautifulSoup而不是regex. 并希望这个例子能帮助你完成你的问题，即使它是 python2.7：

>>> import urllib2
>>> from bs4 import BeautifulSoup as bs4
>>> html_file = urllib2.urlopen("http://finance.yahoo.com/q?s=goog&q1=1")
>>> soup = bs4(html_file)
>>> for price in soup.find(attrs={'id':"yfs_l84_goog"}):
...     print price
... 
846.90
>>>

score 0 · Accepted Answer

我遇到了同样的问题，当我转到http://download.finance.yahoo.com时，我被重定向到http://finance.yahoo.com，并且似乎 CSV 格式的链接被雅虎关闭了。

问题似乎是网址太长且令人费解。也许他们这样做了，所以我们不能继续以这种方式报废他们的数据。有没有不同的方法来解决这个问题？我也尝试从 Finance.msn.com 报废，但遇到了同样的问题，即 url 太复杂且太长。

也许我只需要寻找另一个鲜为人知的金融网站。我会看看我能找到什么。

score 0 · Accepted Answer

使用 Yahoo Finance 的 CSV 格式而不是 HTML，然后使用CsvReader解析结果。

有关 CSV 格式的详细信息，请参见此处。但是，自该文档编写以来，雅虎财经的 URL 发生了变化。使用http://download.finance.yahoo.com而不是http://finance.yahoo.com。

python - 使用 Python 3 从 Yahoo Finance 获取价格以外的 Web 抓取信息

3 回答 3

Related

Reference