0

I am writing a program to scrape the source code off a website. Each time the next button is clicked to go to the next page on the website a post request it sent.

I have been looking at using httpclient to take care of this issue, and have looked through examples and the httpclient API, but I cant seem to figure out whether httpclient can do this. Is this a function of httpclient, and if so what class would go about doing this?

4

1 回答 1

0

我认为您是说您正在执行 http get 的网页上包含一个“下一步按钮”,当您在浏览器中查看该网页并单击下一步按钮时,将显示该网站的下一页.

如果是这种情况,是的,http 客户端能够做同样的事情。但要了解 http 客户端不会与您的 Web 浏览器集成。但是您可以使用像jsoup这样的库来搜索从 http get 请求返回的源代码,以提取网站上“下一个”页面的 url,然后发出另一个 http get 来获取该资源。

假设您已经有 http 客户端发出初始 http get 请求的代码,则不需要额外的 api。在您的程序发现“下一个”资源的 url 后,您只需发出另一个请求。

于 2013-10-14T19:37:49.207 回答