0

这段代码在这里:

Document doc = Jsoup.connect("http://wikitravel.org/en/San_Francisco").get();
System.out.println(doc.select("h2:contains(Get around) ~ *:not(h2:contains(See) ~ *)"));

输出http://pastebin.com/gkcCfr1F。是否有一个选择器可以使“非”选择器包含在内?现在它正在删除“see”之后的所有内容,当我想删除最后一个带有 id="see" 的 h2 标记以及其他所有内容时,因为我正在尝试解析 wiki 的各个部分。

我想获得的最终输出是: http: //pastebin.com/ntpVrgui

4

1 回答 1

0

我会做这样的事情:

获取内容 div :

 StringBuilder sb = new StringBuilder();
    boolean start = false;
    Document doc = Jsoup.connect("http://wikitravel.org/en/San_Francisco").get();
            Elements content = doc.select("#content");
            for (Element element : content) {
                /*Pseudo code
                   if element is h3 and it contains span with id Navigating and if start is  
false append it to stringbuilder, set start to true, else append everything in between until you reach h2 with span id See
                  */                
    }
于 2012-08-03T00:13:03.360 回答