python - BeautifulSoup 找不到任何 标签

Question

我正在尝试在这里抓取网站：ftp: //ftp.sec.gov/edgar/daily-index/。使用如下所示的代码：

from bs4 import BeautifulSoup  
import urllib.request
html = urllib.request.urlopen("ftp://ftp.sec.gov/edgar/daily-index/")
soup = BeautifulSoup(line, "lxml")
soup.a # or soup.find_all('a') neither of them works
#return None.

请帮忙，我对此感到非常沮丧。我怀疑是标签导致了问题。该站点的 Html 看起来格式正确（匹配的标签），所以我不知道为什么 BeautifulSoup 没有找到任何东西。谢谢

score 5 · Accepted Answer

ftp://ftp.sec.gov/edgar/daily-index/URL 指向 FTP 目录，而不是 HTML 页面。

您的浏览器可以根据 FTP 目录内容生成 HTML，但是当您使用urllib.request.

您可能想直接使用该ftplib模块来读取目录列表，或检查 first 的返回值urlopen(...).read()。

python - BeautifulSoup 找不到任何标签

1 回答 1

Related

Reference