我正在使用 R 的 XML 包从各种 html 和 xml 文件中提取所有可能的数据。这些文件基本上是文档或构建属性或自述文件。
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE chapter PUBLIC '-//OASIS//DTD DocBook XML V4.1.2//EN'
'http://www.oasis-open.org/docbook/xml/4.0 docbookx.dtd'>
<chapter lang="en">
<chapterinfo>
<author>
<firstname>Jirka</firstname>
<surname>Kosek</surname>
</author>
<copyright>
<year>2001</year>
<holder>Jiří Kosek</holder>
</copyright>
<releaseinfo>$Id: htmlhelp.xml,v 1.1 2002/05/15 17:22:31 isberg Exp $</releaseinfo>
</chapterinfo>
<title>Using XSL stylesheets to generate HTML Help</title>
<?dbhtml filename="htmlhelp.html"?>
<para>HTML Help (HH) is help-format used in newer versions of MS
Windows and applications written for this platform. This format allows
to pack several HTML files together with images, table of contents and
index into single file. Windows contains browser for this file-format
and full-text search is also supported on HH files. If you want know
more about HH and its capabilities look at <ulink
url="http://msdn.microsoft.com/library/tools/htmlhelp/chm/HH1Start.htm">HTML
Help pages</ulink>.</para>
<section>
<title>How to generate first HTML Help file from DocBook sources</title>
<para>Working with HH stylesheets is same as with other XSL DocBook
stylesheets. Simply run your favorite XSLT processor on your document
with stylesheet suited for HH:</para>
</section>
</chapter>
我的目标是在使用 htmlTreeParse 或 xmlTreeParse 解析树后使用类似这样的东西(对于 xml 文件..)
Text = xmlValue(xmlRoot(xmlTreeParse(XMLFileName)))
但是,当我对 xml 和 html 文件执行此操作时,会出现一个错误。如果有 2 级或更高级别的子节点,则文本字段将被粘贴,它们之间没有任何空格。
例如,在上面的例子中
xmlValue(chapterInfo) 是
JirkaKosek2001JiKosek$Id: htmlhelp.xml,v 1.1 2002/05/15 17:22:31 isberg Exp
每个子节点(递归)的 xmlValues 粘贴在一起,它们之间没有添加空格。如何让 xmlValue 在提取此数据时添加空格
非常感谢您提前提供的帮助,
希瓦尼