1

我有几个网站,我会定期查看这些网站以比较产品价格。Atm 我必须手动登录并在每个网站上通过产品 ID 进行搜索才能获取产品详细信息(价格)。

一段时间后,这既费时又无聊。

我正在考虑制作一个 Web 应用程序,我可以在其中使用我的登录凭据进入所有这些网站。我只需要输入一个产品 ID,我的 webapp 就应该从这些网站获取所有结果并以可比较的方式显示它们。

我不会假设这些网站有 API,所以我正在寻找解决这个问题的最佳方法。我认为这不是那么简单,因为我需要登录 + 搜索产品。

关于我如何做到这一点的任何建议?

谢谢!

4

1 回答 1

0

+1 to Marc B's comment. If the TOS doesn't explicitly forbid it (and since this would also be considered a crawler), you should see if /robots.txt disallows you from accessing the product search. If neither forbid you, I would suggest using a browser-based bot to fetch results for you, simply because it sounds more practical and you wouldn't have to deal with cookies.

If you want to make the page requests with PHP, though, I would direct you to HttpRequest. Have a page where you can log into all the sites (using a POST request right on the login scripts), and keep the session cookies returned handy. When you search the product pages, identify what part of the HTML consistently returns the list of products after it (a regex may be helpful), and create an algorithm (which should be different for every website you want to scrape) that returns information about the product. Then compare the results!

于 2012-06-02T02:03:54.667 回答