xidel -s^
"https://www.google.com/search?q=xidel+follow+pagination&start=0"^
-e "//a/extract(@href,'url\?q=(.+?)&',1)[.]"^
-f "(//td/a/@href)[last()]"^
-e "//a/extract(@href,'url\?q=(.+?)&',1)[.]"
Update 2021:
xidel -s^
--user-agent "Mozilla/5.0 Firefox/94.0.1"^
-H "Cookie: CONSENT=YES+cb.20210518-05-p0.nl+F+224"^
"https://www.google.com/search?q=xidel+follow+pagination"^
-e "//div[@class='yuRUbf']/a/@href"^
-f "//a[@id='pnnext']/@href"
("https://www.google.com" -f "form(//form,{'q':'xidel follow pagination'})"
also works)
Five years ago querying Google without a user-agent or cookie-header would work just fine. Nowadays it won't work without.
My original query (with me being a xidel
rookie and all) would just extract the urls from page 1 and 2. With -f "//a[@id='pnnext']/@href"
now at the end xidel
will recursively follow all result-pages.
Be warned though that although extracting the urls with -e "//div[@class='yuRUbf']/a/@href"
worked for me, it may not work for you, because @class
might have another value and above all, changes over time. Same goes for -f "//a[@id='pnnext']/@href"
.