1

I want to scrape the table 'Summary statement holding of specified securities' from this website https://www.bseindia.com/stock-share-price/infosys-ltd/infy/500209/shareholding-pattern/ I tried scraping data using selenium but it was all in one column without any table and there is no unique identifier to this table. How to use pandas and Beautiful Soup to scrape the table in a structured format or any other method. This is the code I'm trying to figure out but it didn't work.

import requests
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0"
}

params = {
    'id': 0,
    'txtscripcd': '',
    'pagecont': '',
    'subject': ''
}

def main(url):
    r = requests.get(url, params=params, headers=headers)
    df = pd.read_html(r.content)[-1].iloc[:, :-1]
    print(df)

main("")
4

2 回答 2

2

要将表格加载到 DataFrame 和 csv,您可以使用以下示例:

import requests
import pandas as pd
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
api_url = 'https://api.bseindia.com/BseIndiaAPI/api/shpSecSummery_New/w?qtrid=&scripcode=500209'

soup = BeautifulSoup(requests.get(api_url, headers=headers).json()['Data'], 'lxml')
table = soup.select_one('b:contains("Summary statement holding of specified securities")').find_next('table')
df = pd.read_html(str(table))[0].iloc[2:, :]

df.to_csv('data.csv')

保存data.csv(来自 LibreOffice 的屏幕截图):

在此处输入图像描述

于 2020-09-23T18:56:40.420 回答
1

您要查找的数据由以下 API 端点提供:

https://api.bseindia.com/BseIndiaAPI/api/shpSecSummery_New/w?qtrid=&scripcode=500209

其中scripcode是唯一标识符。

API 不检查 cookie/会话,因此直接调用此端点将返回您正在查找的数据。

于 2020-09-23T18:25:47.503 回答