-1

您好我想解析一家商店http://www.mercateo.com。到目前为止,我已经使用了硒。它工作得很好,但速度很慢。我想解决我的问题。我找到了 HtmlUtil 和 JSoup,但我认为我在链接上的 clic 和转到下一页时遇到了麻烦。

我用 HtmlUtil 写了一个简单的例子:

WebClient web = new WebClient();
HtmlPage page = web.getPage("http://news.yahoo.com/");
web.closeAllWindows();

但我收到了很多警告和错误:

WARNING: CSS warning: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3604] Ignoring the following declarations in this rule.
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3996] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://l.yimg.com/zz/combo?d/lib/yui/3.4.1/build/cssreset/cssreset-min.css&d/lib/yui/3.4.1/build/cssfonts/cssfonts-min.css&os/mit/media/p/presentation/grids/master-min-464195.css&os/mit/media/p/presentation/grids/desktop-min-841473.css&os/mit/media/p/presentation/base/master-min-470440.css&os/mit/media/p/presentation/base/desktop-min-341885.css&kx/ucs/uh/css/291/yunivhead-min.css&kx/ucs/uh/css/221/logo-min.css&kx/ucs/homepage/css/155/homepage-ie-min.css&kx/ucs/notif_v2/css/145/notifications_v2-min.css&kx/ucs/mailcount/css/37/mail_preview-min.css&kx/ucs/search/css/190/search_all-min.css&kx/ucs/search/css/190/search_buttons-min.css&kx/ucs/breakingnews/css/12/breaking_news-min.css&os/mit/media/m/header/header-desktop-min-630857.css&os/mit/media/m/navigation/navigation-desktop-min-603998.css&os/mit/media/m/linkbox/linkbox-min-248956.css&os/mit/media/m/ads/ads-min-892923.css&os/mit/media/m/heading/heading-min-214964.css&os/gm/m/footer/footer_sponsor-min-188629.css&os/gm/m/footer/footer_links-min-188629.css&os/mit/media/m/trending/trending-min-150139.css&os/gm/m/footer/footer_info-min-323669.css&os/gm/m/footer/footer_info-desktop-min-944911.css' [20:3996] Ignoring the following declarations in this rule.
sty 29, 2013 11:54:03 AM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error

而且我找不到让我点击链接(XPath)的方法 JSoup 非常适合解析网页,但在页面之间动态切换并不好。

我需要你的帮助 :) 我不知道除了 selenium 之外,我能用其他解析器得到相同的结果吗

4

1 回答 1

0

使用 Jsoup 访问网站上的链接不是问题:

例子:

Document doc = Jsoup.connect("http://first.com/").get(); // Connect to 'root' link
Elements links = doc.select("a[href]"); // Select all Links from the website

// As an example connect to the first link of the website and parse it's html:
doc = Jsoup.connect(links.first().absUrl("href")).get();

// Continue with the new website

另请参阅:使用 Jsoup,我如何获取每个链接中的每个信息?

于 2013-01-29T12:27:11.470 回答