我目前正在尝试使用 sec_edgar_downloader 库从 SEC EDGAR 下载 S-1 文件。我有一个由 CIK 值组成的 pandas DataFrame,对于每个值,我想在可用时下载相关的 S-1。为了检查哪些公司没有它,我添加了一个新列,当找到并下载文件时等于 1,否则为 0。我运行的代码是
df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))
其中 tryconvert() 是一个定义为的函数
def tryconvert(x):
try:
CIK_check(x)
except RecursionError:
return "0"
CIK_check() 是一个定义为的函数
def CIK_check(x):
time.sleep(0.3)
if dl.get("S-1", x) == 1:
return "1"
else:
return "0"
CIK_check 在可用时执行下载文件的操作,并返回表示是否成功的二进制值。我必须添加 tryconvert() 以尝试解决在尝试运行代码时最终出现的错误,其中会引发以下错误:
RecursionError Traceback (most recent call last)
<ipython-input-243-a8a327555f29> in <module>
----> 1 df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
3846 else:
3847 values = self.astype(object).values
-> 3848 mapped = lib.map_infer(values, f, convert=convert_dtype)
3849
3850 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-243-a8a327555f29> in <lambda>(x)
----> 1 df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))
<ipython-input-241-62c62b553142> in CIK_check(x)
1 def CIK_check(x):
2 time.sleep(0.3)
----> 3 if dl.get("S-1", x) == 1:
4 return "1"
5 else:
~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/Downloader.py in get(self, filing, ticker_or_cik, amount, after, before, include_amends, download_details, query)
167 )
168
--> 169 download_filings(
170 self.download_folder,
171 ticker_or_cik,
~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in download_filings(download_folder, ticker_or_cik, filing_type, filings_to_fetch, include_filing_details)
261 if include_filing_details:
262 try:
--> 263 download_and_save_filing(
264 download_folder,
265 ticker_or_cik,
~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in download_and_save_filing(download_folder, ticker_or_cik, accession_number, filing_type, download_url, save_filename, resolve_urls)
218 if resolve_urls and Path(save_filename).suffix == ".html":
219 base_url = f"{download_url.rsplit('/', 1)[0]}/"
--> 220 filing_text = resolve_relative_urls_in_filing(filing_text, base_url)
221
222 # Create all parent directories as needed and write content to file
~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in resolve_relative_urls_in_filing(filing_text, base_url)
198 return soup
199
--> 200 return soup.encode(soup.original_encoding)
201
202
~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in encode(self, encoding, indent_level, formatter, errors)
1526 # Turn the data structure into Unicode, then encode the
1527 # Unicode.
-> 1528 u = self.decode(indent_level, encoding, formatter)
1529 return u.encode(encoding, errors)
1530
~/opt/anaconda3/lib/python3.8/site-packages/bs4/__init__.py in decode(self, pretty_print, eventual_encoding, formatter)
742 else:
743 indent_level = 0
--> 744 return prefix + super(BeautifulSoup, self).decode(
745 indent_level, eventual_encoding, formatter)
746
~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode(self, indent_level, eventual_encoding, formatter)
1596 else:
1597 indent_contents = None
-> 1598 contents = self.decode_contents(
1599 indent_contents, eventual_encoding, formatter
1600 )
~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode_contents(self, indent_level, eventual_encoding, formatter)
1690 text = c.output_ready(formatter)
1691 elif isinstance(c, Tag):
-> 1692 s.append(c.decode(indent_level, eventual_encoding,
1693 formatter))
1694 preserve_whitespace = (
... last 2 frames repeated, from the frame below ...
~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode(self, indent_level, eventual_encoding, formatter)
1596 else:
1597 indent_contents = None
-> 1598 contents = self.decode_contents(
1599 indent_contents, eventual_encoding, formatter
1600 )
RecursionError: maximum recursion depth exceeded
但是,这不起作用,因为我仍然收到此错误,这使得无法完成我尝试执行的任务。错误的原因可能是什么?(不幸的是,鉴于它是 pandas DataFrame 上的 apply 函数,尚不清楚在哪个条目引发错误)。是否有另一种方法可以克服 RecursionError 而不必停止计算并将其简单地视为标记为 0 的失败下载?