所以我试图获取页面包含术语“食谱改编自”或“食谱来自”的范围内的所有网址。这会将文件的所有链接复制到大约 7496,然后它会吐出 HTTPError 404。我做错了什么?我试图实现 BeautifulSoup 和 requests,但我仍然无法让它工作。
import urllib2
with open('recipes.txt', 'w+') as f:
for i in range(14477):
url = "http://www.tastingtable.com/entry_detail/{}".format(i)
page_content = urllib2.urlopen(url).read()
if "Recipe adapted from" in page_content:
print url
f.write(url + '\n')
elif "Recipe from" in page_content:
print url
f.write(url + '\n')
else:
pass