I have the following dummy fake sample:
<family>
<member> dad </member>
<member> mum </member>
<member> son </member>
<member> grandad<> </member>
</family>
I have been given a document to convert into XML but I have been unsuccessful so far in doing so. I have no control over how the document (html) given to me is created but I need to convert the document to xml; So that I can convert it using a stylesheet.
TidyManaged and HAP are no good to me at this stage in my workflow. Will explain more if people are interested knowing why.
In order for me to use HAP successfully, I need the above sample to look like the below:
<family>
<member> dad </member>
<member> mum </member>
<member> son </member>
<member> grandad<> </member>
</family>
My last approach before I give up on this problem would be, to read in my source html document, treat it as a plan text document and read it line by line.
I require someone to give me some regex that will successfully match the inner text of an element i.e:
<member> grandad<> </member>
Would give me the string:
"grandad<>"
If I can get this far, I should be able to convert the angle brackets into html key code equivalents. This should then pass as valid XML allowing me to load this into an XDocument class.
Then replace that result string back with this one:
<member> grandad<> </member>
When all special characters have been 'escaped' like this properly then I will be in a position to leverage the benefits of HTML Agility Pack (HAP) otherwise I will have to give up.
Thanks for reading.