1

I am trying to parse a specific area of html from this webpage:

http://en.wikipedia.org/w/api.php?action=parse&page=Ringo_Starr&prop=text&section=0&format=txtfm&disablepp&redirects

[Please note this is not the source page, it displays html tags but I am interested in the actual source of this page (Ctrl+u)].

Specifically, I am looking to put all of the lines that begin with:

<span style="color:blue;">&lt;p&gt;</span>

into a String.

enter image description here

Here's how I'm trying to solve -- but I seem to be way off:

      Document doc = Jsoup.connect("http://en.wikipedia.org/w/api.php?action=parse&page=Ringo_Starr&prop=text&section=0&format=txtfm&disablepp&redirects").get();   
      Elements elements = doc.select("span");
      for (Element e : elements) {
           if(e.text().equals("&lt;p&gt;")){
               System.out.println("now get that whole line");
           }
     }

Note: I am using jsoup here -- but would a straight regex would be more effective?

4

2 回答 2

1

直接的正则表达式可能是一个更好的主意。初学者试试这个:

Pattern pat=Pattern.compile("^<span style=\"color:blue;\">&lt;p&gt;</span>.+&");

在这里,^开始行,<span style="color:blue;">&lt;p&gt;</span>字面匹配,然后我们有一个或多个非行终止符

正则表达式。匹配除行终止符以外的任何字符,除非指定了 DOTALL 标志。

$指定行尾。

于 2013-09-05T19:20:29.707 回答
0

你就不能写吗

System.out.println(e.nextElementSibling().text())

你还必须检查

e.attr("style").equals("color:blue;")
于 2013-09-05T19:37:17.420 回答