我开发了一种方法,允许您使用 htmlcleaner 从特定类中提取项目,现在我想知道......
如何使用 htmlcleaner 提取 html 中的正文及其所有元素?
public String htmlParser(String html){
TagNode rootNode;
HtmlCleaner html_cleaner = new HtmlCleaner();
rootNode = html_cleaner.clean(html);
TagNode[] items = rootNode.getElementsByName("body", true);
ParseBody(items[0]);
html = item_found;
return html;
}
String item_found;
public void ParseBody(TagNode root){
if(root.getAllElements(true).length > 0){
for(TagNode node: root.getAllElements(true)){
ParseBody(node);
}
}else{
item_found = item_found + root.toString();// root.toString() only brings out the first name inside TagNode
- In here I wanted just the text of all items in the body but it would still be beneficial for everyone if the question is complete-
//if(root.getText().toString() != null || !(root.getText().toString().equals("null"))){
//item_found = item_found + root.getText().toString();
//}
}
}