c# - 在 XmlTextReader 对象中读取“假”xml 文档（xml 片段）

Question

[案例] 我收到了一堆“xml 文件”，其中包含有关大量文档的元数据。至少，这是我要求的。我在没有根元素的“xml 文件”中收到的内容，它们的结构是这样的（我遗漏了一堆元素）：

<folder name = "abc"></folder>
<folder name = "abc/def">
<document name = "ghi1">
</document>
<document name = "ghi2">
</document>
</folder>

[问题] 当我尝试读取 XmlTextReader 对象中的文件时，它无法告诉我没有根元素。

[当前解决方法] 当然，我可以将文件作为流读取，附加 <xmlroot> 和 < /xmlroot> 并将流写入新文件并在 XmlTextReader 中读取该文件。这正是我现在正在做的事情，但我不想“篡改”原始数据。

[请求的解决方案] 我知道我应该为此使用 XmlTextReader，并带有 DocumentFragment 选项。但是，这会产生编译时错误：

System.Xml.dll 中出现“System.Xml.XmlException”类型的未处理异常

附加信息：部分内容解析不支持 XmlNodeType DocumentFragment。第 1 行，位置 1。

[错误代码]

using System.Diagnostics;
using System.Xml;

namespace XmlExample
{
    class Program
    {
        static void Main(string[] args)
        {
            string file = @"C:\test.txt";
            XmlTextReader tr = new XmlTextReader(file, XmlNodeType.DocumentFragment, null);
            while(tr.Read())
                Debug.WriteLine("NodeType: {0} NodeName: {1}", tr.NodeType, tr.Name);
        }
    }
}

score 4 · Accepted Answer

这有效：

using System.Diagnostics;
using System.Xml;

namespace XmlExample
{
    class Program
    {
        static void Main(string[] args)
        {
            string file = @"C:\test.txt";
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.ConformanceLevel = ConformanceLevel.Fragment;
            using (XmlReader reader = XmlReader.Create(file, settings))
            {
                while (reader.Read())
                    Debug.WriteLine("NodeType: {0} NodeName: {1}", reader.NodeType, reader.Name);
            }
        }
    }
}

score 2 · Accepted Answer

尽管XmlReader可以使用ConformanceLevel.FragmentMartijn 演示的选项来读取数据，但似乎XmlDataDocument不喜欢拥有多个根元素的想法。

我想我会尝试一种不同的方法，就像您当前使用的方法一样，但没有中间文件。大多数 XML 库（XmlDocument、XDocument、XmlDataDocument）都可以将 aTextReader作为输入，因此我实现了自己的一个。它是这样使用的：

var dataDocument = new XmlDataDocument();
dataDocument.Load(new FakeRootStreamReader(File.OpenRead("test.xml")));

实际类的代码：

public class FakeRootStreamReader : TextReader
{
    private static readonly char[] _rootStart;
    private static readonly char[] _rootEnd;

    private readonly TextReader _innerReader;
    private int _charsRead;
    private bool _eof;

    static FakeRootStreamReader()
    {
        _rootStart = "<root>".ToCharArray();
        _rootEnd = "</root>".ToCharArray();
    }

    public FakeRootStreamReader(Stream stream)
    {
        _innerReader = new StreamReader(stream);
    }

    public FakeRootStreamReader(TextReader innerReader)
    {
        _innerReader = innerReader;
    }

    public override int Read(char[] buffer, int index, int count)
    {
        if (!_eof && _charsRead < _rootStart.Length)
        {
            // Prepend root element
            return ReadFake(_rootStart, buffer, index, count);
        }

        if (!_eof)
        {
            // Normal reading operation
            int charsRead = _innerReader.Read(buffer, index, count);
            if (charsRead > 0) return charsRead;

            // We've reached the end of the Stream
            _eof = true;
            _charsRead = 0;
        }

        // Append root element end tag at the end of the Stream
        return ReadFake(_rootEnd, buffer, index, count);
    }

    private int ReadFake(char[] source, char[] buffer, int offset, int count)
    {
        int length = Math.Min(source.Length - _charsRead, count);
        Array.Copy(source, _charsRead, buffer, offset, length);
        _charsRead += length;
        return length;
    }
}

第一次调用Read(...)将只返回<root>元素。后续调用正常读取流，直到到达流的末尾，然后输出结束标记。

代码有点……嗯……主要是因为我想处理一些永远不会发生的情况，即有人试图一次读取少于 6 个字符的流。

c# - 在 XmlTextReader 对象中读取“假”xml 文档（xml 片段）

2 回答 2

Related

Reference