java - TagSoup 和 XPath

Question

我正在尝试将 TagSoup 与 XPath (JAXP) 一起使用。我知道如何从 TagSoup（或 XMLReader）获取 SAX 解析器。但是我找不到如何创建将使用该 SAX 解析器的 DocumentBuilder。我怎么做？

谢谢你。

编辑：抱歉这么笼统，但 Java XML API 实在是太痛苦了。

编辑2：

问题解决了：

public static void main(String[] args) throws XPathExpressionException, IOException,
        SAXNotRecognizedException, SAXNotSupportedException,
        TransformerFactoryConfigurationError, TransformerException {

    XPathFactory xpathFac = XPathFactory.newInstance();
    XPath xpath = xpathFac.newXPath();

    InputStream input = new FileInputStream("/tmp/g.html");

    XMLReader reader = new Parser();
    reader.setFeature(Parser.namespacesFeature, false);
    Transformer transformer = TransformerFactory.newInstance().newTransformer();

    DOMResult result = new DOMResult();
    transformer.transform(new SAXSource(reader, new InputSource(input)), result);

    Node htmlNode = result.getNode();
    NodeList nodes = (NodeList) xpath.evaluate("//span", htmlNode, XPathConstants.NODESET);
    System.out.println(nodes.getLength());
}

编辑3：

帮助我的链接：http: //www.jezuk.co.uk/cgi-bin/view/jez?id=2643

score 2 · Accepted Answer

Java XML API 就是这么痛苦

它的确是。考虑迁移到 XSLT 2.0 / XPath 2.0 并改用 Saxon 的 s9api 接口。它看起来大致是这样的：

Processor proc = new Processor();

InputStream input = new FileInputStream("/tmp/g.html");
XMLReader reader = new Parser();
reader.setFeature(Parser.namespacesFeature, false);
Source source = new SAXSource(parser, input);

DocumentBuilder builder = proc.newDocumentBuilder();
XdmNode input = builder.build(source);

XPathCompiler compiler = proc.newXPathCompiler();
XdmValue result = compiler.evaluate("//span", input);
System.out.println(result.size());

java - TagSoup 和 XPath

1 回答 1

Related

Reference