我在 Windows 10 上使用 Java 11 (AdoptOpenJDK 11.0.5 2019-10-15)。我有一些想要处理的旧 XHTML 1.1 文件。它们采用以下一般形式:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>XHTML 1.1 Skeleton</title>
</head>
<body>
</body>
</html>
为了避免解析器等待连接到 Internet,我安装了一个自定义程序来加载存储在程序资源中的EntityResolver
已知实体(从它们的公共 ID,例如)。-//W3C//ELEMENTS XHTML Inline Style 1.0//EN
此类DefaultEntityResolver
还打印调试消息,指示解析器正在加载哪些实体。
这是我解析的基本形式:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
documentBuilder.setEntityResolver(DefaultEntityResolver.getInstance());
final Document document;
try (InputStream inputStream = new BufferedInputStream(getClass().getResourceAsStream("xhtml-1.1-test.xhtml"))) {
document = documentBuilder.parse(inputStream);
}
由于 中的调试消息DefaultEntityResolver
,我可以看到解析器按此顺序加载了以下实体。
-//W3C//DTD XHTML 1.1//EN
(http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd
)-//W3C//ELEMENTS XHTML Inline Style 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod
)-//W3C//ENTITIES XHTML Datatypes 1.0//EN
(http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod
)-//W3C//ENTITIES XHTML Modular Framework 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod
)-//W3C//ENTITIES XHTML Datatypes 1.0//EN
(http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod
)-//W3C//ENTITIES XHTML Qualified Names 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod
)-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod
)-//W3C//ENTITIES XHTML Common Attributes 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod
)-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod
)-//W3C//ENTITIES XHTML Character Entities 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod
)-//W3C//ENTITIES Latin 1 for XHTML//EN
(http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent
)-//W3C//ENTITIES Symbols for XHTML//EN
(http://www.w3.org/MarkUp/DTD/xhtml-symbol.ent
)-//W3C//ENTITIES Special for XHTML//EN
(http://www.w3.org/MarkUp/DTD/xhtml-special.ent
)-//W3C//ELEMENTS XHTML Text 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod
)-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod
)-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod
)-//W3C//ELEMENTS XHTML Block Structural 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod
)-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod
)-//W3C//ELEMENTS XHTML Hypertext 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod
)-//W3C//ELEMENTS XHTML Lists 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod
)-//W3C//ELEMENTS XHTML Editing Elements 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod
)-//W3C//ELEMENTS XHTML BIDI Override Element 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod
)-//W3C//ELEMENTS XHTML Ruby 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-ruby-1.mod
)-//W3C//ELEMENTS XHTML Presentation 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod
)-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod
)-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod
)-//W3C//ELEMENTS XHTML Link Element 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod
)-//W3C//ELEMENTS XHTML Metainformation 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod
)-//W3C//ELEMENTS XHTML Base Element 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod
)-//W3C//ELEMENTS XHTML Scripting 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod
)-//W3C//ELEMENTS XHTML Style Sheets 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod
)-//W3C//ELEMENTS XHTML Images 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod
)-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod
)-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod
)-//W3C//ELEMENTS XHTML Param Element 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod
)-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod
)-//W3C//ELEMENTS XHTML Tables 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod
)-//W3C//ELEMENTS XHTML Forms 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod
)-//W3C//ELEMENTS XHTML Document Structure 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod
)
请注意,其中一些实体不再存在于指定的 URL;尽管如此,我DefaultEntityResolver
已经将这些实体存储并键入了它们的公共 ID,因此仍将它们提供给解析器。
到目前为止,一切都很好。但是当我立即调用时document.normalizeDocument()
,程序会暂停然后打印:
[Error] xhtml11.dtd:129:43: The entity "LanguageCode.datatype" was referenced, but not declared.
[Error] xhtml11.dtd:130:44: The entity "LanguageCode.datatype" was referenced, but not declared.
[Error] xhtml11.dtd:194:47: The entity "Common.attrib" was referenced, but not declared.
请注意,这不是我打印这些错误的程序;显然是里面的东西document.normalizeDocument()
。此外,这里还有另外两个好奇心:
- 如果我从 Eclipse 中运行我的应用程序,则不会发生这种情况。
- 如果我禁用我的网络连接,这不会发生。
我最好的猜测是document.normalizeDocument()
没有使用EntityResolver
我在文档生成器中安装的自定义。因为某些实体不再存在于其预期的 URL(例如http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod
),它们无法加载,因此指示的引用实体永远不会被定义。但是,Web 服务器需要很长时间才能响应实体丢失(因为您可以手动测试),这使得程序似乎暂停了。这也可以解释为什么当我的网络连接被禁用时错误消息没有出现;我猜无法加载任何外部实体,立即失败,但这不被视为错误。(不过,这些都不能解释为什么它在 Eclipse 中没有暂停或错误消息。)
事实上,DOMConfiguration
文档提示我需要设置某种resource-resolver
参数,尽管我不确定为什么DOMConfiguration
不默认使用我在用于解析 XML 文档的原始文档构建器中设置的实体解析器。
为了让事情有点奇怪,我将上面的 XHTML 1.1 框架文档放在我的资源中,并创建了一个与上面的代码完全相同的单元测试,然后是document.normalizeDocument()
,测试通过,没有停顿也没有错误,即使是从命令行!
但是,如果我for(int i = 0; i < 100; i++)
在单元测试中放置一个循环;加载、解析和规范化文档 100 次(但使用相同的DocumentBuilderFactory
);我的单元测试完全崩溃了分叉的单元测试JVM!
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (default-test) on project [...]: There are test failures.
Please refer to [...]\xml\target\surefire-reports for the individual test results.
Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Command was cmd.exe /X /C [...]
Process Exit Code: 0
Crashed tests:
[...].XmlDomTest
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Command was cmd.exe /X /C [...]
Process Exit Code: 0
Crashed tests:
com.globalmentor.xml.XmlDomTest
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:282)
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:245)
at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:957)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:193)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
Caused by: org.apache.maven.plugin.MojoExecutionException: There are test failures.
所以我想我想避免document.normalizeDocument()
,但我欢迎对这种行为进行任何澄清。