java - 使用 htmlcleaner 解析

翻译自：https://stackoverflow.com/questions/10819672 2012-05-30T15:43:21.530

2041 次

我开发了一种方法，允许您使用 htmlcleaner 从特定类中提取项目，现在我想知道......

如何使用 htmlcleaner 提取 html 中的正文及其所有元素？

public String htmlParser(String html){

    TagNode rootNode;
    HtmlCleaner html_cleaner = new HtmlCleaner();
    rootNode = html_cleaner.clean(html);
    TagNode[] items = rootNode.getElementsByName("body", true);
    ParseBody(items[0]);
    html = item_found;
    return html;
}

String item_found;
public void ParseBody(TagNode root){
    if(root.getAllElements(true).length > 0){
        for(TagNode node: root.getAllElements(true)){
            ParseBody(node);
        }           
    }else{
        item_found = item_found + root.toString();// root.toString() only brings out the first name inside TagNode

- In here I wanted just the text of all items in the body but it would still be beneficial for everyone if the question is complete-

        //if(root.getText().toString() != null || !(root.getText().toString().equals("null"))){
            //item_found = item_found + root.getText().toString();
        //}
    }
}

java - 使用 htmlcleaner 解析

0 回答 0

Related

Reference