python - 使用 python 和 beautifulsoup 检查网页的结果

Question

我需要检查网页搜索结果并将它们与用户输入进行比较。

ui = raw_input() #for example "Niels Bohr"
link = "http://www.enciklopedija.hr/Trazi.aspx?t=profesor,%20gdje&s=90&k=10"
stranica=urllib.urlopen(link)
soup = BeautifulSoup(stranica, from_encoding="utf-8")
beauty = soup.prettify()
print beauty

由于有 1502 个结果，我的想法是k=10将k=1502. 现在我需要某种函数来检查搜索结果是否包含我的用户输入。我知道我的名字是 TEXT 之后的文字，那怎么办？也许使用正则表达式？第二部分是是否有匹配的结果来获取结果的链接。同样，我知道该链接在那个 href="" 内，但是如何将其取出并使其可用=

score 0 · Accepted Answer

查找是否列出了 Niels Bohr 就像使用大批量并加载结果页面一样简单：

import sys
import urllib2

from bs4 import BeautifulSoup

url = "http://www.enciklopedija.hr/Trazi.aspx?t=profesor,%20gdje&s=0&k={}".format(sys.maxint)
name = u'Bohr, Niels'

page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

for link in soup.find_all(class_='AllWordsTextHit', text=name):
    print link

这会生成包含该文本'Bohr, Niels'作为链接文本的任何链接。如果需要部分匹配，可以使用正则表达式。

链接对象有一个（相对）href属性，您可以使用它来加载下一页：

professor_page = 'http://www.enciklopedija.hr/' + link['href']

python - 使用 python 和 beautifulsoup 检查网页的结果

1 回答 1

Related

Reference