In my case, it took some digging, but found it.
My Context
I'm looking at exception/error logs from the website using Elmah. Elmah returns the state of the server at the of time the exception, in the form of a large XML document. For our reporting engine I pretty-print the XML with XmlWriter.
During a website attack, I noticed that some xmls weren't parsing and was receiving this '.', hexadecimal value 0x00, is an invalid character.
exception.
NON-RESOLUTION: I converted the document to a byte[]
and sanitized it of 0x00, but it found none.
When I scanned the xml document, I found the following:
...
<form>
...
<item name="SomeField">
<value
string="C:\boot.ini�.htm" />
</item>
...
There was the nul byte encoded as an html entity �
!!!
RESOLUTION: To fix the encoding, I replaced the �
value before loading it into my XmlDocument
, because loading it will create the nul byte and it will be difficult to sanitize it from the object. Here's my entire process:
XmlDocument xml = new XmlDocument();
details.Xml = details.Xml.Replace("�", "[0x00]"); // in my case I want to see it, otherwise just replace with ""
xml.LoadXml(details.Xml);
string formattedXml = null;
// I have this in a helper function, but for this example I have put it in-line
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings {
OmitXmlDeclaration = true,
Indent = true,
IndentChars = "\t",
NewLineHandling = NewLineHandling.None,
};
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
xml.Save(writer);
formattedXml = sb.ToString();
}
LESSON LEARNED: sanitize for illegal bytes using the associated html entity, if your incoming data is html encoded on entry.