我正在尝试解析网站的多个页面,但我不明白如何更改 url 的查询(如果这有意义?)
我尝试创建一个 next_page ,它每次找到下一页元素时都会添加第一页并添加 +1,但我认为我不能,因为我将有多个起始 url(都相似)。当我尝试获取下一页元素的信息时,它返回:
["loadmoreresult('?networkId=24&pageNumber=2&pageSize=100&allnet=yes&networkIds=1&networkIds=2&networkIds=3&networkIds=4&networkIds=61&networkIds=98&networkIds=108&networkIds=6&networkIds=5&networkIds=22&networkIds=13&networkIds=18&networkIds=15&networkIds=16&networkIds=105&networkIds=38&licenseIds=0&licenseIds= 0&licenseIds=0&licenseIds=0&licenseIds=0&searchby=CountryCode&orderby=CountryCity&country=ES&city=&keyword=&lastCid=116490');返回false;"]
使用 url.parse(response.url).query 我得到:
'networkId=24&pageNumber=1&pageSize=100&allnet=yes&networkIds=1&networkIds=2&networkIds=3&networkIds=4&networkIds=61&networkIds=98&networkIds=108&networkIds=6&networkIds=5&networkIds=22&networkIds=13&networkIds=18&networkIds=15&networkIds=16&networkIds=105&networkIds=38&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds =0&searchby=CountryCode&orderby=CountryCity&country=ES&city=&keyword='
我需要做的就是创建一个使用相同方案、路径的新链接,然后更改查询。
如果您需要更多信息,请告诉我,我真的不知道什么与您更相关,因为我还是一个初学者。
from urllib.parse import urlparse, urljoin
urlparse(response.url)
>>> ParseResult(scheme='https', netloc='www.wcaworld.com', path='/Directory', params='', query='networkId=24&pageNumber=1&pageSize=100&allnet=yes&networkIds=1&networkIds=2&networkIds=3&networkIds=4&networkIds=61&networkIds=98&networkIds=108&networkIds=6&networkIds=5&networkIds=22&networkIds=13&networkIds=18&networkIds=15&networkIds=16&networkIds=105&networkIds=38&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&searchby=CountryCode&orderby=CountryCity&country=ES&city=&keyword=', fragment='')
response.css('a.loadmore::attr(onmouseover)').extract()
>>>["loadmoreresult('?networkId=24&pageNumber=2&pageSize=100&allnet=yes&networkIds=1&networkIds=2&networkIds=3&networkIds=4&networkIds=61&networkIds=98&networkIds=108&networkIds=6&networkIds=5&networkIds=22&networkIds=13&networkIds=18&networkIds=15&networkIds=16&networkIds=105&networkIds=38&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&searchby=CountryCode&orderby=CountryCity&country=ES&city=&keyword=&lastCid=116490'); return false;"]