我是 Python 的初学者,我正在尝试使用 python 访问以下数据。
1) https://www.nseindia.com/corporates/corporateHome.html,点击左侧窗格中“公司信息”下的“公司公告”。2) 输入公司代码(例如 KSCL)并选择公告期 3) 单击任何单个行主题以获取更多详细信息
前两个步骤转换为以下网址“ https://www.nseindia.com/corporates/corpInfo/equities/getAnnouncements.jsp?period=More%20than%203%20Months&symbol=kscl&industry=&subject= ”。这在我的 python 请求代码中运行良好。
当我从浏览器尝试这个时,我将所有请求标头与我使用 python 发送的内容进行了比较,它们匹配。我也尝试发送 cookie,但没有成功。我认为可能不需要 cookie,因为网站在禁用 cookie 后也可以在浏览器中运行。我在 Python 3.5 上运行它。
import requests as rq
from requests.utils import requote_uri
from requests_html import HTMLSession
import demjson as dj
from urllib.parse import quote
class BuyBack:
def start(self):
# Define headers used across all requests
self.req_headers = {'user-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36',}
self.req_headers['Accept'] = '*/*'
self.req_headers['Accept-Encoding'] = 'gzip, deflate, br'
def readAnnouncement(self, pyAnnouncement):
# This is done using request_html
symbol = pyAnnouncement['sym']
desc = pyAnnouncement['desc']
tstamp = pyAnnouncement['date']
seqId = pyAnnouncement['seqId']
payload = {'symbol' : symbol,'desc' : desc, 'tstamp' : tstamp, 'seqId' : seqId}
quote_payload = {}
params_string = '?'
#formats as required with '%20' for spaces
for(k,v) in payload.items():
quote_payload [quote(k)] = quote(v)
params_string += quote(k)
params_string += '='
params_string += quote(v)
params_string += '&'
params_string = params_string[:-1]
announDetail_Url = 'https://nseindia.com/corporates/corpInfo/equities/AnnouncementDetail.jsp'
self.req_headers['Referer'] = 'https://www.nseindia.com/corporates/corpInfo/equities/Announcements.html'
self.req_headers['X-Requested-With'] = 'XMLHttpRequest'
self.req_headers['Host'] = 'www.nseindia.com'
annReqUrl = announDetail_Url + params_string
session = HTMLSession()
r = session.get(annReqUrl, headers = self.req_headers)
#I am not getting the proper data in the response
def getAllSymbols(self):
# To get the list of symbols to run the rest of the process, for now just run with one
symbol = 'KSCL'
def getAnnouncements(self,symbol):
# To get a list of all announcements so far in the last few months
# This is done by using requests and demjson because the request returns a js object
# Open request to get everything
payload = {'symbol' : symbol,'Industry' : '', 'ExDt' : '', 'subject' : ''}
r = rq.get(corporateActions_url, headers = self.req_headers, params=payload)
for line in r.iter_lines():
lineAscii = line.decode("ascii")
if len(lineAscii) > 5:
pyAnnouncements = dj.decode(lineAscii)
#Tried setting the cookie but no use
#cookie = r.headers['Set-Cookie']
#self.req_headers['Cookie'] = cookie
# read from the announcements
if pyAnnouncements['success']:
#for x in pyAnnouncements['rows']:
for i in range(0,1):
BuyBack_inst = BuyBack()
当我从浏览器尝试此流程时,第二个呼叫响应将具有指向另一个 pdf 的 href 链接。但我没有在我的 python 响应中得到那个 href 链接。