我很想知道脚本中每个动作所花费的时间。下面的脚本抓取未来 10 天内发布收益的股票,然后抓取它们当前的股价,最后从 yfinance API 抓取我感兴趣的其他项目。

当我使用 tqdm 包中的状态跟踪器“trange()”时,我遇到了各种各样的问题。该脚本需要很长时间才能运行,并且在从 API 提取基本面和技术数据的最后一个块中,该脚本为每只股票重复请求 x 次,其中 x 是 Symbols 列表中的股票总数。

有人可以帮我理解我试图合并的 tqdm 功能出了什么问题吗?:

import datetime
import pandas as pd
import time
import requests
import yfinance as yf
from tqdm import trange
import sys

StartTime = time.time()

###                                               ###
###   Grab Stocks with Earnings in Next 30 Days   ###
###                                               ###

CalendarDays = 30 #<-- specify the number of calendar days you want to grab earnings release info for
tables = [] #<-- initialize an empty list to store your tables

print('1. Grabbing companies with earnings releases in the next ' + str(CalendarDays) + ' days.')

# for i in trange(CalendarDays, file = sys.stdout, desc = '1. Grabbing companies with earnings releases in the next ' + str(CalendarDays) + ' days'):

for i in range(CalendarDays): #<-- Grabs earnings release info for the next x days on the calendar
            date = (datetime.date.today() + datetime.timedelta(days = i )).isoformat() #get tomorrow in iso format as needed'''
            url = pd.read_html("https://finance.yahoo.com/calendar/earnings?day="+date, header=0)
            table = url[0]
            table['Earnings Release Date'] = date
            tables.append(table) #<-- append each table into your list of tables
        except ValueError:

df = pd.concat(tables, ignore_index = True) #<-- take your list of tables into 1 final dataframe
df_unique = df.drop_duplicates(subset=['Symbol'], keep='first', ignore_index = True)
DataSet = df_unique.drop(['Reported EPS','Surprise(%)'], axis = 1)

Symbols = df_unique['Symbol'].to_list()

###                             ###
###   Grab Latest Stock Price   ###
###                             ###

print('2. Grabbing latest share prices for ' + str(len(Symbols)) + ' stocks.')

df_temp = pd.DataFrame()

# for i in trange(len(Symbols), file = sys.stdout, desc = '2. Grabbing latest stock prices'):

for symbol in Symbols:
            params = {'symbols': symbol,
                      'range': '1d',
                      'interval': '1d',
                      'indicators': 'close',
                      'includeTimestamps': 'false',
                      'includePrePost': 'false',
                      'corsDomain': 'finance.yahoo.com',
                      '.tsrc': 'finance'

            url = 'https://query1.finance.yahoo.com/v7/finance/spark'

            r = requests.get(url, params=params)
            data = r.json()
            Price = data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]
            df_stock = pd.DataFrame({'Symbol' : [symbol],
                                     'Current Price' : [Price]
            df_temp = df_temp.append(df_stock)
        except KeyError:

DataSet = pd.merge(DataSet, df_temp[['Symbol', 'Current Price']], on = 'Symbol', how = "left")

###                                     ###
###   Grab Other Important Stock Info   ###
###                                     ###

print('3. Grabbing stock fundamental and technical metrics.')

StartTime = time.time()

df_temp2 = pd.DataFrame()

# for i in trange(len(Symbols), file = sys.stdout, desc = 'Grabbing stock fundamental and technical metrics'):

for symbol in Symbols:
            Ticker = yf.Ticker(symbol).info
            Sector = Ticker.get('sector')
            Industry = Ticker.get('industry')
            P2B = Ticker.get('priceToBook')
            P2E = Ticker.get('trailingPE')
            # print(symbol, Sector, Industry, P2B, P2E)
            df_stock = pd.DataFrame({'Symbol' : [symbol],
                                     'Sector' : [Sector],
                                     'Industry' : [Industry],
                                     'PriceToBook' : [P2B],
                                     'PriceToEarnings' : [P2E],
            df_temp2 = df_temp2.append(df_stock)
        except: KeyError

DataSet = pd.merge(DataSet, df_temp2, on = 'Symbol', how = "left")


ExecutionTime = (time.time() - StartTime)
print('Script is complete! This script took ' + format(str(round(ExecutionTime, 1))) + ' seconds to run.')

TodaysDate = datetime.date.today().isoformat()

您可以使用 tqdm 函数(而不是 trange)在任何可迭代对象上生成进度条。trange 专门用于在指定的数值范围(链接)上进行迭代。所以你可以像这样导入:

from tqdm import tqdm

并使用 tqdm 作为您的包装器:

for symbol in tqdm(Symbols, file = sys.stdout, desc = '2. Grabbing latest stock prices'):

请注意,您要遍历 Symbols,而不是 len(Symbols)。trange 可能是脚本第一部分的合适选择,因为您正在迭代指定的数值范围而不是更通用的可迭代对象。

于 2020-11-29T16:47:12.223 回答