1

I have a little bit complex, dirty html code. Is there a good HTML Parser that i can use the HTML code as a Java Object?

e.g. I want access this Code:

<html>
  <body>
   <div id='foo'>
     <p id='bar'></p>
   </div>
  </body>
</html>

like via DOM:

[File/Code].getElementById('foo').appendText('bla');
[File/Code].getElement(Element.DIV).getElement(ELEMENT.P).getValue();
//etc...

have somebody an idea?

Or is there DOM in Java (this does not help :()?

Greetings

4

1 回答 1

4

试试http://jsoup.org/。它可以处理非常破碎的html。

例子:

public static void main(String[] args)
{
    Document document = Jsoup.parse("<html>" +
            "  <body>" +
            "   <div id='foo'>" +
            "     <p id='bar'>TEST</p>" +
            "   </div>" +
            "  </body>" +
            "</html>");

    System.out.println("Add blah to the Element with ID: foo");
    Element foo = document.getElementById("foo");
    foo.appendText("blah");

    System.out.println(document.html());

    System.out.println("Get the content of a div having a p:");
    for (Element div : document.getElementsByTag("div"))
    {
        for (Element p : div.getElementsByTag("p"))
        {
            System.out.println(p.text());
        }

    }
}

马文

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.7.2</version>
</dependency>
于 2013-08-04T19:43:49.473 回答