我有一些html。我尝试用库清理它:http: //sourceforge.net/projects/tidynet/
这是我的代码:
//clean up html
Tidy tidy = new Tidy();
tidy.Options.DocType = DocType.Omit;
tidy.Options.DropFontTags = true;
tidy.Options.LogicalEmphasis = true;
tidy.Options.Xhtml = true;
tidy.Options.XmlOut = true;
tidy.Options.MakeClean = true;
tidy.Options.TidyMark = false;
tidy.Options.CharEncoding = CharEncoding.UTF8;
/* Declare the parameters that is needed */
TidyMessageCollection tmc = new TidyMessageCollection();
MemoryStream input = new MemoryStream();
MemoryStream output = new MemoryStream();
byte[] byteArray = Encoding.UTF8.GetBytes(report);
input.Write(byteArray, 0, byteArray.Length);
input.Position = 0;
tidy.Parse(input, output, tmc);
string cleanHtml = Encoding.UTF8.GetString(output.ToArray());
然后我尝试使用 xslt:
try
{
StringBuilder res = new StringBuilder();
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(XmlReader.Create(new StringReader(stylesheet.Content)));
xslt.Transform(StringExtensions.ToXmlReader(cleanHtml), null, new StringWriter(res));
var resultReport = res.ToString();
}
catch (Exception e)
{
}
我得到一个例外:
'=' 字符,十六进制值 0x3D,不能包含在名称中
更新 如何从“=”中自动清除名称?