java - Get specific lines out of HTML page and put into String

Question

I am trying to parse a specific area of html from this webpage:

http://en.wikipedia.org/w/api.php?action=parse&page=Ringo_Starr&prop=text&section=0&format=txtfm&disablepp&redirects

[Please note this is not the source page, it displays html tags but I am interested in the actual source of this page (Ctrl+u)].

Specifically, I am looking to put all of the lines that begin with:

<span style="color:blue;">&lt;p&gt;</span>

into a String.

enter image description here

Here's how I'm trying to solve -- but I seem to be way off:

      Document doc = Jsoup.connect("http://en.wikipedia.org/w/api.php?action=parse&page=Ringo_Starr&prop=text&section=0&format=txtfm&disablepp&redirects").get();   
      Elements elements = doc.select("span");
      for (Element e : elements) {
           if(e.text().equals("&lt;p&gt;")){
               System.out.println("now get that whole line");
           }
     }

Note: I am using jsoup here -- but would a straight regex would be more effective?

score 1 · Accepted Answer

直接的正则表达式可能是一个更好的主意。初学者试试这个：

Pattern pat=Pattern.compile("^<span style=\"color:blue;\">&lt;p&gt;</span>.+&");

在这里，^开始行，<span style="color:blue;"><p></span>字面匹配，然后我们有一个或多个非行终止符：

正则表达式。匹配除行终止符以外的任何字符，除非指定了 DOTALL 标志。

并$指定行尾。

score 0 · Accepted Answer

你就不能写吗

System.out.println(e.nextElementSibling().text())

你还必须检查

e.attr("style").equals("color:blue;")

java - Get specific lines out of HTML page and put into String

2 回答 2

Related

Reference