37

我正在从 StringBuilder 生成 XML 文档,基本上类似于:

string.Format("<text><row>{0}</row><col>{1}</col><textHeight>{2}</textHeight><textWidth>{3}</textWidth><data>{4}</data><rotation>{5}</rotation></text>

后来,类似:

XmlDocument document = new XmlDocument();
document.LoadXml(xml);
XmlNodeList labelSetNodes = document.GetElementsByTagName("labels");
for (int index = 0; index < labelSetNodes.Count; index++)
{
    //do something
}

所有的数据都来自一个数据库。最近我遇到了一些错误问题:

十六进制值 0x00 是无效字符,第 1 行,位置 nnnnn

但它并不一致。有时一些“空白”数据会起作用。“错误”数据适用于某些 PC,但不适用于其他 PC。

在数据库中,数据总是一个空字符串。它永远不会是“null”,并且在 XML 文件中,它会显示为< data>< /data>,即在打开和关闭之间没有字符。(但不确定这是否可以依赖,因为我将它从“即时”窗口中拉出是 vis studio 并将其粘贴到 textpad 中)。

sql server 的版本(2008 会失败,2005 会工作)和排序规则也可能存在差异。不确定这些是否是可能的原因?

但完全相同的代码和数据有时会失败。问题出在哪里的任何想法?

4

7 回答 7

36

如果没有您的实际数据或来源,我们将很难诊断出问题所在。不过,我可以提出几点建议:

  • Unicode NUL (0x00) 在所有 XML 版本中都是非法的,验证解析器必须拒绝包含它的输入。
  • 尽管如此;现实世界中未经验证的 XML 可以包含任何可以想象的垃圾格式错误的字节。
  • XML 1.1 允许零宽度和非打印控制字符(NUL 除外),因此您无法在文本编辑器中查看 XML 1.1 文件并知道其中包含哪些字符。

Given what you wrote, I suspect whatever converts the database data to XML is broken; it's propagating non-XML characters.

Create some database entries with non-XML characters (NULs, DELs, control characters, et al.) and run your XML converter on it. Output the XML to a file and look at it in a hex editor. If this contains non-XML characters, your converter is broken. Fix it or, if you cannot, create a preprocessor that rejects output with such characters.

If the converter output looks good, the problem is in your XML consumer; it's inserting non-XML characters somewhere. You will have to break your consumption process into separate steps, examine the output at each step, and narrow down what is introducing the bad characters.

Check file encoding (for UTF-16)

Update: I just ran into an example of this myself! What was happening is that the producer was encoding the XML as UTF16 and the consumer was expecting UTF8. Since UTF16 uses 0x00 as the high byte for all ASCII characters and UTF8 doesn't, the consumer was seeing every second byte as a NUL. In my case I could change encoding, but suggested all XML payloads start with a BOM.

于 2012-06-14T18:27:00.513 回答
16

In my case, it took some digging, but found it.

My Context

I'm looking at exception/error logs from the website using Elmah. Elmah returns the state of the server at the of time the exception, in the form of a large XML document. For our reporting engine I pretty-print the XML with XmlWriter.

During a website attack, I noticed that some xmls weren't parsing and was receiving this '.', hexadecimal value 0x00, is an invalid character. exception.

NON-RESOLUTION: I converted the document to a byte[] and sanitized it of 0x00, but it found none.

When I scanned the xml document, I found the following:

...
<form>
...
<item name="SomeField">
   <value
     string="C:\boot.ini&#x0;.htm" />
 </item>
...

There was the nul byte encoded as an html entity &#x0; !!!

RESOLUTION: To fix the encoding, I replaced the &#x0; value before loading it into my XmlDocument, because loading it will create the nul byte and it will be difficult to sanitize it from the object. Here's my entire process:

XmlDocument xml = new XmlDocument();
details.Xml = details.Xml.Replace("&#x0;", "[0x00]");  // in my case I want to see it, otherwise just replace with ""
xml.LoadXml(details.Xml);

string formattedXml = null;

// I have this in a helper function, but for this example I have put it in-line
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings {
    OmitXmlDeclaration = true,
    Indent = true,
    IndentChars = "\t",
    NewLineHandling = NewLineHandling.None,
};
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
    xml.Save(writer);
    formattedXml = sb.ToString();
}

LESSON LEARNED: sanitize for illegal bytes using the associated html entity, if your incoming data is html encoded on entry.

于 2013-10-24T17:24:28.867 回答
9

To add to Sonz's answer above, following worked for us.

//Instead of 
XmlString.Replace("&#x0;", "[0x00]");
// use this
XmlString.Replace("\x00", "[0x00]");
于 2015-07-17T16:26:27.690 回答
4

I also get the same error in an ASP.NET application when I saved some unicode data (Hindi) in the Web.config file and saved it with "Unicode" encoding.

It fixed the error for me when I saved the Web.config file with "UTF-8" encoding.

于 2013-04-17T05:24:16.543 回答
4

As kind of a late answer:

I've had this problem with SSRS ReportService2005.asmx when uploading a report.

    Public Shared Sub CreateReport(ByVal strFileNameAndPath As String, ByVal strReportName As String, ByVal strReportingPath As String, Optional ByVal bOverwrite As Boolean = True)
        Dim rs As SSRS_2005_Administration_WithFOA = New SSRS_2005_Administration_WithFOA
        rs.Credentials = ReportingServiceInterface.GetMyCredentials(strCredentialsURL)
        rs.Timeout = ReportingServiceInterface.iTimeout
        rs.Url = ReportingServiceInterface.strReportingServiceURL
        rs.UnsafeAuthenticatedConnectionSharing = True

        Dim btBuffer As Byte() = Nothing

        Dim rsWarnings As Warning() = Nothing
        Try
            Dim fstrStream As System.IO.FileStream = System.IO.File.OpenRead(strFileNameAndPath)
            btBuffer = New Byte(fstrStream.Length - 1) {}
            fstrStream.Read(btBuffer, 0, CInt(fstrStream.Length))
            fstrStream.Close()
        Catch ex As System.IO.IOException
            Throw New Exception(ex.Message)
        End Try

        Try
            rsWarnings = rs.CreateReport(strReportName, strReportingPath, bOverwrite, btBuffer, Nothing)

            If Not (rsWarnings Is Nothing) Then
                Dim warning As Warning
                For Each warning In rsWarnings
                    Log(warning.Message)
                Next warning
            Else
                Log("Report: {0} created successfully with no warnings", strReportName)
            End If

        Catch ex As System.Web.Services.Protocols.SoapException
            Log(ex.Detail.InnerXml.ToString())
        Catch ex As Exception
            Log("Error at creating report. Invalid server name/timeout?" + vbCrLf + vbCrLf + "Error Description: " + vbCrLf + ex.Message)
            Console.ReadKey()
            System.Environment.Exit(1)
        End Try
    End Sub ' End Function CreateThisReport

The problem occurs when you allocate a byte array that is at least 1 byte larger than the RDL (XML) file.

Specifically, I used a C# to vb.net converter, that converted

  btBuffer = new byte[fstrStream.Length];

into

  btBuffer = New Byte(fstrStream.Length) {}

But because in C# the number denotes the NUMBER OF ELEMENTS in the array, and in VB.NET, that number denotes the UPPER BOUND of the array, I had an excess byte, causing this error.

So the problem's solution is simply:

  btBuffer = New Byte(fstrStream.Length - 1) {}
于 2013-11-15T08:37:32.390 回答
2

I'm using IronPython here (same as .NET API) and reading the file as UTF-8 in order to properly handle the BOM fixed the problem for me:

xmlFile = Path.Combine(directory_str, 'file.xml')
doc = XPathDocument(XmlTextReader(StreamReader(xmlFile.ToString(), Encoding.UTF8)))

It would work as well with the XmlDocument:

doc = XmlDocument()
doc.Load(XmlTextReader(StreamReader(xmlFile.ToString(), Encoding.UTF8)))
于 2017-05-30T03:50:12.160 回答
0

I haved the same issue, when I tried to save the file, whole code is perfect but in the last procedure, the follow error message appears: "'.', Hexadecimal value 0x00 is a invalid character."

1.Looking in the develop tool, I found in the name assigned to sheets collection {Hoja1}, {Cartera}, {JennyG, {MariaD, ...

2.Then I saw last character '}' in the name of sheets should be lost to the any time in the algorithm process to assign names of the sheet from a DataTable Object.

3.On the Name property the real Name of the sheet is "MariaD\0\0\0\0\0\0\0\0\0\0\0\0\0\0", the hidden character in property name is not supported "\0".

4.Finally, the solution is replace current character with "" empty string in the all sheets name.

于 2021-05-25T18:23:46.360 回答