python - Python 雅虎证券交易所（网页抓取）

Question

我在使用以下代码时遇到问题，假设通过访问 yahoo Finance 来打印股票价格，但我不知道为什么它返回空字符串？

import urllib
import re

symbolslist = ["aapl","spy", "goog","nflx"]
i = 0
while i < len(symbolslist):
    url = "http://finance.yahoo.com/q?s="+symbolslist[i]+"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()

    regex = '<span id="yfs_l84_' + symbolslist[i] + '">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print price
    i+=1

编辑：它现在工作正常，这是一个语法错误。也编辑了上面的代码。

score 1 · Accepted Answer

这些只是python开发（和抓取）的一些有用技巧：

Python 请求库。

python requests库在简化请求过程方面非常出色。

无需使用`while`循环

for循环在这种情况下非常有用。

symbolslist = ["aapl","spy", "goog","nflx"]
for symbol in symbolslist:
    # Do logic here...

在正则表达式上使用 xpath

import requests
import lxml

url = "http://www.google.co.uk/finance?q="+symbol+"&q1=1"
r = requests.get(url)
xpath = '//your/xpath'
root = lxml.html.fromstring(r.content)

无需每次都编译正则表达式。

编译正则表达式需要时间和精力。您可以将这些从循环中抽象出来。

regex = '<span id="yfs_l84_' + symbolslist[i] + '">(.+?)</span>'
pattern = re.compile(regex)

for symbol in symbolslist:
    # do logic

外部库

正如评论中提到的drewkPandas 和 Matplot 都具有获取 Yahoo 报价的本机函数，或者您可以使用ystockquote库从 Yahoo 抓取。这是这样使用的：

#!/bin/env python
import ystockquote

symbolslist = ["aapl","spy", "goog","nflx"]
for symbol in symbolslist:
    print (ystockquote.get_price(symbol))

python - Python 雅虎证券交易所（网页抓取）

1 回答 1

Python 请求库。

无需使用while循环

在正则表达式上使用 xpath

无需每次都编译正则表达式。

外部库

Related

Reference

无需使用`while`循环