我正在尝试抓取这个网站。我设法通过使用 urllib 和 beautifulsoup 来做到这一点。但是 urllib 太慢了。我想要异步请求,因为网址有数千个。我发现一个不错的包是 grequests。
例子:
import grequests
from bs4 import BeautifulSoup
pages = []
page="https://www.spitogatos.gr/search/results/residential/sale/r100/m100m101m102m103m104m105m106m107m108m109m110m150m151m152m153m154m155m156m157m158m159m160m161m162m163m164m165m166m167m168m169m170m171m172m173m174m175m176m177m178m179m180m181m182m183m184m185m186m187m188m189m190m191m192m193m194m195m196m197m198m106001m125000m"
for i in range(1,1000):
pages.append(page)
page="https://www.spitogatos.gr/search/results/residential/sale/r100/m100m101m102m103m104m105m106m107m108m109m110m150m151m152m153m154m155m156m157m158m159m160m161m162m163m164m165m166m167m168m169m170m171m172m173m174m175m176m177m178m179m180m181m182m183m184m185m186m187m188m189m190m191m192m193m194m195m196m197m198m106001m125000m"
page = page + "/offset_{}".format(i*10)
rs = (grequests.get(item) for item in pages)
a=grequests.map(rs)
问题是我不知道如何继续和使用beautifulsoup。从而得到每个页面的html代码。很高兴听到你的想法。谢谢!