你能告诉我们你的代码吗?
顺便提一句。如果要解析网站,最好使用connect()
而不是parse()
.
<div class="controlcontent_r">...</div>
以下是如何获取标签的示例:
final String url = "http://www.jabraat.com/categories/Buy-Digital-Cameras-Online/cid-CU00084377.aspx";
Document doc = Jsoup.connect(url).get();
for( Element element : doc.select("div.controlcontent_r") )
{
System.out.println(element);
System.out.println();
}
此代码打印三个元素(由空行分隔):
<div class="controlcontent_r">
<div class="mtc-menu">
<ul class="mtc-cat">
<li class="mtc-block"><a class="mtc-a mtc-selected" title="Go To Digital Cameras" href="http://www.jabraat.com/categories/Buy-Digital-Cameras-Online/cid-CU00084377.aspx">Digital Cameras</a></li>
<li class="mtc-block"><a class="mtc-a" title="Go To Camcoders" href="http://www.jabraat.com/categories/Buy-Camcorders-Online/cid-CU00084380.aspx">Camcoders</a></li>
<li class="mtc-block1"><a class="mtc-a" title="Go To Camera Accessories" href="http://www.jabraat.com/categories/Buy-Camera-Accessories-Online/cid-CU00084381.aspx">Camera Accessories</a></li>
</ul>
</div>
</div>
<div class="controlcontent_r">
<div class="mtc-menu">
<ul class="mtc-cat">
<li class="mtc-block"><a class="mtc-a" title="Go To Camera" href="http://www.jabraat.com/categories/Buy-Cameras-Online/cid-CU00084376.aspx">Camera</a></li>
<li class="mtc-block"><a class="mtc-a" title="Go To Digital Photo Frames" href="http://www.jabraat.com/categories/Buy-Digital-Photo-Frames-Online/cid-CU00084382.aspx">Digital Photo Frames</a></li>
<li class="mtc-block1"><a class="mtc-a" title="Go To Mobiles" href="http://www.jabraat.com/categories/Buy-Mobiles-Online/cid-CU00084383.aspx">Mobiles</a></li>
</ul>
</div>
</div>
<div class="controlcontent_r">
<div class="mtc-menu">
<ul class="mtc-cat">
<li class="mtc-block"><a class="mtc-a" title="Go to Watches" href="http://www.jabraat.com/categories/Buy-Watches-Online/cid-CU00084370.aspx">Watches</a></li>
<li class="mtc-block"><a class="mtc-a" title="Go to Clothing" href="http://www.jabraat.com/categories/Buy-Online-Clothing/cid-CU00084420.aspx">Clothing</a></li>
<li class="mtc-block"><a class="mtc-a" title="Go to Mobiles" href="http://www.jabraat.com/categories/Buy-Mobiles-Online/cid-CU00084383.aspx">Mobiles</a></li>
<li class="mtc-block"><a class="mtc-a" title="Go to Cameras" href="http://www.jabraat.com/categories/Buy-Cameras-Online/cid-CU00084376.aspx">Cameras</a></li>
<li class="mtc-block"><a class="mtc-a" title="Go to Home & Kitchen" href="http://www.jabraat.com/categories/Buy-Home-Kitchen-Appliances-Online/cid-CU00084391.aspx">Home & Kitchen</a></li>
<li class="mtc-block"><a class="mtc-a" title="Go to Personal Care" href="http://www.jabraat.com/categories/Buy-Online-Personal-Care/cid-CU00084413.aspx">Personal Care</a></li>
<li class="mtc-block"><a class="mtc-a" title="Go to Jewellery" href="http://www.jabraat.com/categories/Buy-Online-Jewellery/cid-CU00084429.aspx">Jewellery</a></li>
<li class="mtc-block1"><a class="mtc-a" title="Go to Footwear" href="http://www.jabraat.com/categories/Buy-Online-Footwear/cid-CK00101771.aspx">Footwear</a></li>
</ul>
</div>
</div>
编辑:
正如评论中提到的,使用<div class='bucket'>
标签会使事情变得更加复杂。controlcontent_r
虽然您可以使用 jsoup轻松解析标签,但bucket
看起来是由脚本生成的。
你可以做一个简单的测试:
final String url = "http://www.jabraat.com/categories/Buy-Digital-Cameras-Online/cid-CU00084377.aspx";
Document doc = Jsoup.connect(url).get(); // Connect an parse the document (as above)
System.out.println(doc); // Output the document (= how jsoup "see"'s the website)
那里没有bucket
标签,这意味着您无法检索它(使用 jsoup) - 解决方案是使用另一个库来执行脚本。
方便的是,我已经在这里发布了一个简短的列表:尝试解析被 javascript 隐藏的 html