python - 为什么我的汤 obj 是空的？

Question

我正在尝试获取 class='profile-search-school-link' 的所有 URL，但甚至无法获取汤对象。

我执行以下操作：

site = "http://www.geteducated.com/profiles/search/Computer%20Science%20%26%20IT&SS=Search%20by%20Subject%20%3E%20Computer%20Science%20%26%20IT/?start=15"

""" gets a list of the urls for the degree programs """
r = requests.get(site)
html_source = r.text
soup = BeautifulSoup(html_source)

print(soup.prettify())

输出：

<class 'bs4.BeautifulSoup'> # print statement
[] # my depressingly empty soup

代码怎么了？当我粘贴到浏览器中时，链接没有损坏。
如何获取 URL？

score 1 · Accepted Answer

我不了解您，但对我来说链接已损坏-这可能是您的第一个问题；）

我收到错误代码500响应

嗯，所以在我第一次访问没有 ?start 的基本 url 后它就可以工作了。

啊，我认为这是因为在您第一次访问该网站后，它会将内容存储在您的本地存储中 - 例如 cookie。除非您启用 cookie，否则 Beautiful Soup 无法做到这一点；）

我建议使用CookieLib

python - 为什么我的汤 obj 是空的？

1 回答 1

Related

Reference