python - 查找网站中所有可能的链接 / 使用 Python 进行 Screen-Web Scraping

Question

这里有点开放式问题。我需要浏览招聘网站并搜索职位描述标签和技能要求（我已经完成了）。我基本上想知道，我如何在网站上爬行？如，从 test.com 到 test.com/a 等等....？？基本上，抓取页面。

这是我在页面内搜索的代码。我需要在站点中找到所有可能的此类页面并获取链接。这不是家庭作业。我只是在一边做这个...

import urllib2
import re

html_content = urllib2.urlopen('http://www.ziprecruiter.com/job/Systems-     Engineer/b5452eab/?source=customer-cpc-indeed').read()

matchDescription = re.findall('Bachelor', html_content);
matchSkill = re.findall('VMware', html_content);


print matchDescription
print matchSkill

if ( len(matchDescription) and len(matchSkill) )== 0: 
   print 'I did not find anything'
else:
   print 'My string is in the html'

score 0 · Accepted Answer

考虑使用Scrapy或其他一些现有的抓取框架。否则，您需要使用lxml或其他一些 HTML 解析器手动查找必要的链接，并使用基于urllib或类似的手动机制和一些数据结构来存储输入和输出数据。

python - 查找网站中所有可能的链接 / 使用 Python 进行 Screen-Web Scraping

1 回答 1

Related

Reference