1

I have the following dummy fake sample:

<family>
   <member> dad </member>
   <member> mum </member>
   <member> son </member>
   <member> grandad<> </member>
</family>

I have been given a document to convert into XML but I have been unsuccessful so far in doing so. I have no control over how the document (html) given to me is created but I need to convert the document to xml; So that I can convert it using a stylesheet.

TidyManaged and HAP are no good to me at this stage in my workflow. Will explain more if people are interested knowing why.

In order for me to use HAP successfully, I need the above sample to look like the below:

<family>
   <member> dad </member>
   <member> mum </member>
   <member> son </member>
   <member> grandad&lt;&gt; </member>
</family>

My last approach before I give up on this problem would be, to read in my source html document, treat it as a plan text document and read it line by line.

I require someone to give me some regex that will successfully match the inner text of an element i.e:

<member> grandad<> </member>

Would give me the string:

"grandad<>"

If I can get this far, I should be able to convert the angle brackets into html key code equivalents. This should then pass as valid XML allowing me to load this into an XDocument class.

Then replace that result string back with this one:

<member> grandad&lt;&gt; </member>

When all special characters have been 'escaped' like this properly then I will be in a position to leverage the benefits of HTML Agility Pack (HAP) otherwise I will have to give up.

Thanks for reading.

4

2 回答 2

1

最简单的正则表达式

var reg = new Regex(@"(?<=<(\w+)>)(.*)(?=</\1>)");
var input = "<member> grandad<Regexp is a bad tool because of <strong>this</strong>> </member>";
var output = reg.Match(input).Value;

问题是如果您的member标签包含任何空格或属性或更多,那么一个member标签将在单行中。因此,如果您可以提供最丑陋的示例,我将更改表达式以调整您的输入。

于 2013-09-11T20:27:07.710 回答
-1

如果您可以手动处理每个文档,那么您可以使用 notepad++。

reindent xml(TextFX->TextFX HTML Tools->Reindent xml> 功能将自动强加您想要的实体。

于 2013-09-11T10:34:22.050 回答