python - 如何从python中的股票代码或公司名称获取股票市场公司部门

Question

给定公司代码或名称，我想使用 python获取其部门。

我已经尝试了几种潜在的解决方案，但没有一个成功

最有前途的两个是：

1) 使用以下脚本：https ://gist.github.com/pratapvardhan/9b57634d57f21cf3874c

from urllib import urlopen
from lxml.html import parse

'''
Returns a tuple (Sector, Indistry)
Usage: GFinSectorIndustry('IBM')
'''
def GFinSectorIndustry(name):
  tree = parse(urlopen('http://www.google.com/finance?&q='+name))
  return tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

但是我正在使用python --version 3.8

我已经能够调整这个解决方案，但最后一行不起作用，我对抓取网页完全陌生，所以如果有人有一些建议，我将不胜感激。

这是我当前的代码：

from urllib.request import Request, urlopen
from lxml.html import parse

name="IBM"
req = Request('http://www.google.com/finance?&q='+name, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)

tree = parse(webpage)

但是最后一部分不起作用，我对这种xpath语法很陌生：

tree.xpath("//a[@id='sector']")[0].text, tree.xpath("//a[@id='sector']")[0].getnext().text

2) 另一个选项是嵌入R的TTN包，如下所示：查找股票属于哪个部门

但是，我想在我的 Jupyter 笔记本中运行它，而且运行它需要很长时间ss <- stockSymbols()

score 1 · Accepted Answer

根据您的评论，特别是对于marketwatch.com/investing/stock而言，可能有效的 xpath意味着做"//div[@class='intraday__sector']/span[@class='label']"

tree.xpath("//div[@class='intraday__sector']/span[@class='label']")[0].text

应该返回所需的信息。

我对抓取网页完全陌生 [...]

一些精度：

此 xpath 完全取决于您正在查看的网站，解释了为什么在"//a[@id='sector']"您在评论中提到的页面中搜索没有希望，因为此 xpath（现已过时）是特定于 google-finance 的。换句话说，您首先需要“研究”您感兴趣的页面，以了解您想要的信息位于何处。
为了进行这样的“研究”，我使用Chrome DevTools并检查控制台中的任何 xpath，在此处记录$x(<your-xpath-of-interest>)函数的位置（带有示例！）。$x
幸运的是，您想从marketwatch.com/investing/stock获取的信息（行业名称）是静态生成的_{（即不是在页面加载时动态生成的，在这种情况下，需要其他抓取技术，诉诸其他 python 库，例如Selenium .. 但这是另一个问题）。}

score 0 · Accepted Answer

要回答这个问题：

如何从python中的股票代码或公司名称获取股票市场公司部门？

在阅读了来自@keepAlive 的一些材料和一些不错的建议后，我不得不找到解决办法。

以下以相反的方式完成工作，即获取给定部门的公司。有 10 个部门，所以如果想要所有部门的信息，工作量并不大：https ://www.stockmonitor.com/sectors/

鉴于 marketwatch.com/investing/stock 抛出 405 错误，我决定使用https://www.stockmonitor.com/sectors/，例如：

https://www.stockmonitor.com/sector/healthcare/

这是代码：

import requests

import pandas as pd

from lxml.html import parse
from urllib.request import Request, urlopen

headers = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)" + " "
    "AppleWebKit/537.36 (KHTML, like Gecko)" + " " + "Chrome/35.0.1916.47" +
    " " + "Safari/537.36"
]

url = 'https://www.stockmonitor.com/sector/healthcare/'

headers_dict = {'User-Agent': headers[0]}
req = Request(url, headers=headers_dict)
webpage = urlopen(req)

tree = parse(webpage)

healthcare_tickers = []
for element in tree.xpath("//tbody/tr/td[@class='text-left']/a"):

    healthcare_tickers.append(element.text)

pd.Series(healthcare_tickers)

因此，有医疗保健行业healthcare_tickers的股票公司。

score 0 · Accepted Answer

您可以通过 yahoo Finance 轻松获取任何给定公司/股票代码的行业：

import yfinance as yf

tickerdata = yf.Ticker('TSLA') #the tickersymbol for Tesla
print (tickerdata.info['sector'])

代码返回：“消费者周期性”

如果您想要有关公司/股票代码的其他信息，只需 print(tickerdata.info) 即可查看所有其他可能的字典键和相应的值，例如上面代码中使用的 ['sector']。

python - 如何从python中的股票代码或公司名称获取股票市场公司部门

3 回答 3

Related

Reference