4

我正在尝试使用 JSoup 来获取此 url “ http://binscorner.com/pages/t/timesofindiacartoons.html ”的内容,它包含卡通图像,但在图像标签中有图像的 url。我需要刮掉所有卡通图像。我不确定如何获得实际图像。怎么做?

<font size="3" face="Times New Roman">
 <br />
 <br />
</font>
<img src="http://www.binscorner.com/mails/res/grey.gif" alt="" width="283" height="487" data-original="http://binscorner.com/mails//t/timesofindiacartoons/part-003.jpeg" />
<p>
 <font size="3" face="Times New Roman">
  &nbsp;
 </font>
</p>
<p>
 <img src="http://www.binscorner.com/mails/res/grey.gif" alt="" width="330" height="591" data-original="http://binscorner.com/mails//t/timesofindiacartoons/part-004.jpeg" />
</p>
<p>
 <img src="http://www.binscorner.com/mails/res/grey.gif" alt="" width="330" height="591" data-original="http://binscorner.com/mails//t/timesofindiacartoons/part-005.jpeg" />
</p>
<p>
 <img src="http://www.binscorner.com/mails/res/grey.gif" alt="" width="330" height="591" data-original="http://binscorner.com/mails//t/timesofindiacartoons/part-006.jpeg" />
</p>
<p> 
4

2 回答 2

3

I would try to get all img tags by doing a select("img") and then get the attributes you like with attr("data-original").

For a tutorial see this: http://jsoup.org/cookbook/extracting-data/example-list-links

于 2013-04-27T23:16:47.690 回答
3

像@Mike说的那样做

代码

Document document = Jsoup.parse(html);

Elements images = document.select("img");
for (Element image : images) {
    String imageUrl = image.attr("data-original");
    System.out.println(imageUrl);
}

结果

http://binscorner.com/mails//t/timesofindiacartoons/part-003.jpeg
http://binscorner.com/mails//t/timesofindiacartoons/part-004.jpeg
http://binscorner.com/mails//t/timesofindiacartoons/part-005.jpeg
http://binscorner.com/mails//t/timesofindiacartoons/part-006.jpeg
于 2013-04-29T18:41:47.637 回答