1

我正在尝试 使用 Jsoup从诺基亚开发者网站http://www.developer.nokia.com/Devices/Device_specifications/Nokia_Asha_308/提取移动规范数据。如何获取每个子类别的数据,例如“相机功能”、“图形格式”等。分别地。

import java.io.IOException;
import java.sql.SQLException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Nokiareviews {
public static void main(String[] args) throws IOException, SQLException,    InterruptedException {
Document doc = Jsoup.connect("http://www.developer.nokia.com/Devices/Device_specifications/Nokia_Asha_308/").timeout(1000*1000).get();
Elements content = doc.select("div#accordeonContainer");
for (Element spec : content) {
System.out.println(spec.text());
}
}

}
4

1 回答 1

3

如果你仔细观察,你会发现每个类别都是一个<div>with class=accordeonContainer,它的标题在一个h2(在那个下面),而子类别列表在<dl>一个"clearfix"CSS 类下面:

<div class="accordeonContainer accordeonExpanded">
    <h2 class=" accordeonTitle "><span>Multimedia</span></h2>
    <div class="accordeonContent" id="Multimedia" style="display: block;">
        <dl class="clearfix">
            <dt>Camera Resolution</dt>
            <dd>1600 x 1200 pixels  </dd>
                ...    
            <dt>Graphic Formats</dt>
            <dd>BMP, DCF, EXIF, GIF87a, GIF89a, JPEG, PNG, WBMP </dd>
            ...
        </dl>
    </div>
</div>

您可以使用以下命令选择特定类型(例如elm)和给定 CSS 类(例如clazz)的元素列表:

Elements elms = doc.select("elm.clazz");

然后,简而言之,提取您提到的信息的代码可能是:

public class Nokiareviews {
    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("http://www.developer.nokia.com/Devices/Device_specifications/Nokia_Asha_308/")
                .timeout(1000 * 1000).get();
        Elements content = doc.select("div.accordeonContainer");
        for (Element spec : content) {
            Elements h2 = spec.select("h2.accordeonTitle");
            System.out.println(h2.text());

            Elements dl = spec.select("dl.clearfix");
            Elements dts = dl.select("dt");
            Elements dds = dl.select("dd");

            Iterator<Element> dtsIterator = dts.iterator();
            Iterator<Element> ddsIterator = dds.iterator();
            while (dtsIterator.hasNext() && ddsIterator.hasNext()) {
                Element dt =  dtsIterator.next();
                Element dd =  ddsIterator.next();
                System.out.println("\t\t" + dt.text() + "\t\t" + dd.text());
            }
        }
    }
}

如果使用 Maven,请确保将其添加到您的pom.xml

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.7.2</version>
</dependency>
于 2013-04-17T02:09:19.453 回答