我建议您不要操纵收到的数据。如果它无效,那是您的客户的问题。
编辑输入使其成为有效的 xml 可能会导致严重的问题,例如,您最终可能会处理错误的数据,而不是抛出错误(因为您已尽力使 xml 有效,但这可能会导致不同的数据)。
[编辑]
我仍然认为这不是一个好主意,但有时你必须做你必须做的事情。
这是一个非常简单的类,它解析输入并替换无效的开始标签。您可以使用正则表达式(我不擅长)来执行此操作,但此解决方案并不完整,例如,根据您的要求(或者说您得到的坏 xml)您将不得不采用它(例如扫描完整的 xml 元素而不是只有“<”和“>”括号,将 CDATA 放在节点的内部文本周围等等)。
我只是想说明你是如何做到的,所以如果它很慢/有错误,请不要抱怨(正如我所提到的,我不会这样做)。
class XmlCleaner
{
public void Clean(Stream sourceStream, Stream targetStream)
{
const char openingIndicator = '<';
const char closingIndicator = '>';
const int bufferSize = 1024;
long length = sourceStream.Length;
char[] buffer = new char[bufferSize];
bool startTagFound = false;
StringBuilder writeBuffer = new StringBuilder();
using(var reader = new StreamReader(sourceStream))
{
var writer = new StreamWriter(targetStream);
try
{
while (reader.Read(buffer, 0, bufferSize) > 0)
{
foreach (var c in buffer)
{
if (c == openingIndicator)
{
if (startTagFound)
{
// we have 2 following opening tags without a closing one
// just replace the first one
writeBuffer = writeBuffer.Replace("<", "<");
// append the new one
writeBuffer.Append(c);
}
else
{
startTagFound = true;
writeBuffer.Append(c);
}
}
else if (c == closingIndicator)
{
startTagFound = false;
// write writebuffer...
writeBuffer.Append(c);
writer.Write(writeBuffer.ToString());
writeBuffer.Clear();
}
else
{
writeBuffer.Append(c);
}
}
}
}
finally
{
// unfortunately the streamwriter's dispose method closes the underlying stream, so e just flush it
writer.Flush();
}
}
}
要测试它:
var testxml =
@"<base>
<elem1 number='1'>
<elem2>yyy</elem2>
<elem3>xxx <yyy zzz aaa</elem3>
</elem1>
</base>";
string result;
using (var source = new MemoryStream(Encoding.ASCII.GetBytes(testxml)))
using(var target = new MemoryStream()) {
XmlCleaner cleaner = new XmlCleaner();
cleaner.Clean(source, target);
target.Position = 0;
using (var reader = new StreamReader(target))
{
result = reader.ReadToEnd();
}
}
XDocument.Parse(result);
var expectedResult =
@"<base>
<elem1 number='1'>
<elem2>yyy</elem2>
<elem3>xxx <yyy zzz aaa</elem3>
</elem1>
</base>";
Debug.Assert(result == expectedResult);