python - 下载 SEC 数据时出现递归错误

Question

我目前正在尝试使用 sec_edgar_downloader 库从 SEC EDGAR 下载 S-1 文件。我有一个由 CIK 值组成的 pandas DataFrame，对于每个值，我想在可用时下载相关的 S-1。为了检查哪些公司没有它，我添加了一个新列，当找到并下载文件时等于 1，否则为 0。我运行的代码是

df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))

其中 tryconvert() 是一个定义为的函数

def tryconvert(x):
    try:
        CIK_check(x)
    except RecursionError:
        return "0"

CIK_check() 是一个定义为的函数

def CIK_check(x):
    time.sleep(0.3)
    if dl.get("S-1", x) == 1:
        return "1"
    else:
        return "0"

CIK_check 在可用时执行下载文件的操作，并返回表示是否成功的二进制值。我必须添加 tryconvert() 以尝试解决在尝试运行代码时最终出现的错误，其中会引发以下错误：

RecursionError                            Traceback (most recent call last)
<ipython-input-243-a8a327555f29> in <module>
----> 1 df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3846             else:
   3847                 values = self.astype(object).values
-> 3848                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3849 
   3850         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-243-a8a327555f29> in <lambda>(x)
----> 1 df["s1"] = df["s1"].apply(lambda x : tryconvert(CIK_check(x)))

<ipython-input-241-62c62b553142> in CIK_check(x)
      1 def CIK_check(x):
      2     time.sleep(0.3)
----> 3     if dl.get("S-1", x) == 1:
      4         return "1"
      5     else:

~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/Downloader.py in get(self, filing, ticker_or_cik, amount, after, before, include_amends, download_details, query)
    167         )
    168 
--> 169         download_filings(
    170             self.download_folder,
    171             ticker_or_cik,

~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in download_filings(download_folder, ticker_or_cik, filing_type, filings_to_fetch, include_filing_details)
    261         if include_filing_details:
    262             try:
--> 263                 download_and_save_filing(
    264                     download_folder,
    265                     ticker_or_cik,

~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in download_and_save_filing(download_folder, ticker_or_cik, accession_number, filing_type, download_url, save_filename, resolve_urls)
    218     if resolve_urls and Path(save_filename).suffix == ".html":
    219         base_url = f"{download_url.rsplit('/', 1)[0]}/"
--> 220         filing_text = resolve_relative_urls_in_filing(filing_text, base_url)
    221 
    222     # Create all parent directories as needed and write content to file

~/opt/anaconda3/lib/python3.8/site-packages/sec_edgar_downloader/_utils.py in resolve_relative_urls_in_filing(filing_text, base_url)
    198         return soup
    199 
--> 200     return soup.encode(soup.original_encoding)
    201 
    202 

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in encode(self, encoding, indent_level, formatter, errors)
   1526         # Turn the data structure into Unicode, then encode the
   1527         # Unicode.
-> 1528         u = self.decode(indent_level, encoding, formatter)
   1529         return u.encode(encoding, errors)
   1530 

~/opt/anaconda3/lib/python3.8/site-packages/bs4/__init__.py in decode(self, pretty_print, eventual_encoding, formatter)
    742         else:
    743             indent_level = 0
--> 744         return prefix + super(BeautifulSoup, self).decode(
    745             indent_level, eventual_encoding, formatter)
    746 

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode(self, indent_level, eventual_encoding, formatter)
   1596         else:
   1597             indent_contents = None
-> 1598         contents = self.decode_contents(
   1599             indent_contents, eventual_encoding, formatter
   1600         )

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode_contents(self, indent_level, eventual_encoding, formatter)
   1690                 text = c.output_ready(formatter)
   1691             elif isinstance(c, Tag):
-> 1692                 s.append(c.decode(indent_level, eventual_encoding,
   1693                                   formatter))
   1694             preserve_whitespace = (

... last 2 frames repeated, from the frame below ...

~/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py in decode(self, indent_level, eventual_encoding, formatter)
   1596         else:
   1597             indent_contents = None
-> 1598         contents = self.decode_contents(
   1599             indent_contents, eventual_encoding, formatter
   1600         )

RecursionError: maximum recursion depth exceeded

但是，这不起作用，因为我仍然收到此错误，这使得无法完成我尝试执行的任务。错误的原因可能是什么？（不幸的是，鉴于它是 pandas DataFrame 上的 apply 函数，尚不清楚在哪个条目引发错误）。是否有另一种方法可以克服 RecursionError 而不必停止计算并将其简单地视为标记为 0 的失败下载？

python - 下载 SEC 数据时出现递归错误

0 回答 0

Related

Reference