java - 为什么 JDOM 的 getChild() 方法返回 null？

Question

我正在做一个关于 html 文档操作的项目。我希望现有 html 文档中的正文内容将其修改为新的 html。现在我正在使用 JDOM。我想在我的编码中使用 body 元素。为此我在我的编码中使用了 getChild("body")。但是它返回 null 给我的程序。但是我的 html 文档有一个 body 元素。任何人都可以帮助我了解这个问题我是一名学生？

将不胜感激指针..

编码：

import org.jdom.Document;
import org.jdom.Element;
public static void getBody() {
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", true);
org.jdom.Document jdomDocument=builder.build("http://www......com");
Element root = jdomDocument.getRootElement();
      //It returns null
System.out.println(root.getChild("body"));
}

请也参考这些.. 我的 html 的根和子打印在控制台中...

root.getName():html

SIZE:2

[Element: <head [Namespace: http://www.w3.org/1999/xhtml]/>]

[Element: <body [Namespace: http://www.w3.org/1999/xhtml]/>]

score 9 · Accepted Answer

我在你的代码中发现了一些问题：1）如果你想通过网络构建一个远程 xml，你应该使用另一个接收 URL 作为输入的构建方法。实际上，您正在将名称为“www......com”的文件解析为 xml。

Document jdomDocument = builder.build( new URL("http://www........com"));

2）如果要将html页面解析为xml，则必须检查它是否是格式正确的xhtml文档，否则无法将其解析为xml

3）正如我在另一个答案中已经说过的那样，root.getChild("body")返回根的孩子，名字是“body”，没有命名空间。您应该检查要查找的元素的名称空间；如果它有一个合格的命名空间，你必须以这种方式传递它：

root.getChild("body", Namespace.getNamespace("your_namespace_uri"));

要以简单的方式知道哪个命名空间包含您的元素，您应该使用 getChildren 方法打印出所有根的子节点：

for (Object element : doc.getRootElement().getChildren()) {
    System.out.println(element.toString());
}

如果您尝试解析 xhtml，可能您有命名空间 uri http://www.w3.org/1999/xhtml。所以你应该这样做：

root.getChild("body", Namespace.getNamespace("http://www.w3.org/1999/xhtml"));

score 2 · Accepted Answer

是什么让你觉得你需要 org.ccil.cowan.tagsoup.Parser？它为您提供了 JDK 中内置的解析器没有提供的什么？

我会尝试使用 SAXBuilder 的另一个构造函数。使用 JDK 中内置的解析器，看看是否有帮助。

首先使用XMLOutputter打印出整个树。

public static void getBody() 
{
    SAXBuilder builder = new SAXBuilder(true);
    Document document = builder.build("http://www......com");
    XMLOutputter outputter = new XMLOutputter();
    outputter.output(document, System.out);  // do something w/ exception
}

score 1 · Accepted Answer

import org.jdom.Document;
import org.jdom.Element;
public static void getBody() {
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", true);
org.jdom.Document jdomDocument=builder.build("http://www......com");
Element root = jdomDocument.getRootElement();
      //It returns null
System.out.println(root.getChild("body", Namespace.getNamespace("my_name_space")));
}

java - 为什么 JDOM 的 getChild() 方法返回 null？

3 回答 3

Related

Reference