2

我的任务是将restful web 服务的结果转换为具有新格式的XML 文档。

要转换的 html/xhtml 示例:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title>OvidWS Result Set Resource</title>
    </head>
    <body>
        <table id="results">
            <tr>
                <td class="_index">
                  <a class="uri" href="REDACTED">1</a>
                </td>
                <td class="au">
                  <span>GILLESPIE JB</span>
                  <span>KUKES RE</span>
                </td>
                <td class="so">A.M.A. American Journal of Diseases of Children</td>
                <td class="ti">Acetylsalicylic acid poisoning with recovery.</td>
                <td class="ui">20267726</td>
                <td class="yr">1947</td>
              </tr>
              <tr>
                <td class="_index">
                  <a class="uri" href="REDACTED">2</a>
                </td>
                <td class="au">BASS MH</td>
                <td class="so">Journal of the Mount Sinai Hospital, New York</td>
                <td class="ti">Aspirin poisoning in infants.</td>
                <td class="ui">20265054</td>
                <td class="yr">1947</td>
              </tr>
        </table>  
    </body>
</html>

理想情况下,我要做的就是将列出的任何内容作为类属性并将其设为元素名称,在没有“类”属性的情况下,我只想将其标记为项目。

这是我正在寻找的转换:

<results>
    <citation>
        <_index>
            <uri href="REDACTED">1</uri>
        </_index>
        <au>
            <item>GILLESPIE JB</item>
            <item>KUKES RE</item>
        </au>
        <so>A.M.A. American Journal of Diseases of Children</so>
        <ti>Acetylsalicylic acid poisoning with recovery.</ti>
        <ui>20267726</ui>
        <yr>1947</yr>
    </citation>
    <citation>
        <_index>
            <uri href="REDACTED">2</a>
        </_index>
        <au>BASS MH</au>
        <so>Journal of the Mount Sinai Hospital, New York</so>
        <ti>Aspirin poisoning in infants.</ti>
        <ui>20265054</ui>
        <yr>1947</yr>
    </citation>
</results>  

我在这里找到了一小段代码,它允许我重命名一个节点:

    Public Shared Function RenameNode(ByVal e As XmlNode, newName As String) As XmlNode
        Dim doc As XmlDocument = e.OwnerDocument
        Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
        While (e.HasChildNodes)
            newNode.AppendChild(e.FirstChild)
        End While
        Dim ac As XmlAttributeCollection = e.Attributes
        While (ac.Count > 0) 
            newNode.Attributes.Append(ac(0))
        End While
        Dim parent As XmlNode = e.ParentNode
        parent.ReplaceChild(newNode, e)
        Return newNode
    End Function

但是在对 XmlAttributeCollection 进行迭代时会出现问题。由于某种原因,在查看其中一个 td 节点时,源中没有出现的 2 个属性神奇地出现了:rowspan 和 colspan。似乎这些属性与迭代器混淆,因为当它们被消耗时,它们不会像“类”属性那样从属性列表中消失。而是消耗属性的值(从“1”变为“”)。这会导致无限循环。

我注意到它们属于“XMLUnspecifiedAttribute”类型,但是当我修改循环以检测到:

While (ac.Count > 0) And Not TypeOf (ac(0)) Is System.Xml.XmlUnspecifiedAttribute
    newNode.Attributes.Append(ac(0))
End While

我收到以下错误:

System.Xml.XmlUnspecifiedAttribute is not accessible in this context because it is 'friend'

任何想法为什么会发生这种情况或如何解决它?

4

1 回答 1

2

我认为您遇到的问题确实是您的文档类型声明。

因为您将节点完全转换为其他内容,所以我会说您甚至不需要它并且可以安全地忽略它

由于我没有将它包含在我的测试中,然后当我包含它时,xmlresolver 变得混乱了,我假设您在这里肯定不需要它。

您可以通过将解析器设置为nothing

{xml document object}.Xmlresolver = nothing

然后你选择节点和进程。即使使用源文件中的 doc 类型,我也这样做了,但仍然没有问题。

这是我用来测试的代码:

Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
    Dim USEDoc As New XmlDocument

    Dim theNameManager As System.Xml.XmlNamespaceManager = New System.Xml.XmlNamespaceManager(USEDoc.NameTable)
    theNameManager.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml")

    USEDoc.XmlResolver = Nothing
    USEDoc.Load("RestServ.txt")

    renameNodes(USEDoc.SelectSingleNode("descendant::xhtml:table", theNameManager))

    Dim SaveDoc As New XmlDocument
    SaveDoc.AppendChild(SaveDoc.ImportNode(USEDoc.SelectSingleNode("//results", theNameManager), True))

    SaveDoc.Save("RestServConv.xml")
End Sub

Public Function renameNodes(ByVal TopNode As XmlNode) As Boolean
    Dim UseNode As XmlNode

    If TopNode.Name <> "#text" Then
        If TopNode.Name = "tr" Then
            UseNode = RenameNode(TopNode, "citation")
        ElseIf TopNode.Name = "table" Then
            UseNode = RenameNode(TopNode, "results")
            UseNode.Attributes.RemoveNamedItem("id")
        ElseIf TopNode.Attributes.Count > 0 Then
            For Each oAttribute As XmlAttribute In TopNode.Attributes
                If oAttribute.Name = "class" Then
                    UseNode = RenameNode(TopNode, oAttribute.Value)
                    UseNode.Attributes.RemoveNamedItem("class")
                    Exit For
                End If
            Next oAttribute
        End If

        If UseNode IsNot Nothing Then
            If UseNode.ChildNodes.Count > 0 Then
                Dim x As Integer
                For x = 0 To UseNode.ChildNodes.Count - 1
                    renameNodes(UseNode.ChildNodes(x))
                Next x
            End If
        End If
    End If

    Return True
End Function

Public Shared Function RenameNode(ByVal e As XmlNode, ByVal newName As String) As XmlNode
    Dim doc As XmlDocument = e.OwnerDocument
    Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
    While (e.HasChildNodes)
        newNode.AppendChild(e.FirstChild)
    End While
    Dim ac As XmlAttributeCollection = e.Attributes
    While (ac.Count > 0)
        newNode.Attributes.Append(ac(0))
    End While
    Dim parent As XmlNode = e.ParentNode
    parent.ReplaceChild(newNode, e)
    Return newNode
End Function

我传入了您的示例文档,得到的结果是:

<results>
  <citation>
    <_index>
      <uri href="REDACTED">1</uri>
    </_index>
    <au>
      <span xmlns="http://www.w3.org/1999/xhtml">GILLESPIE JB</span>
      <span xmlns="http://www.w3.org/1999/xhtml">KUKES RE</span>
    </au>
    <so rowspan="1" colspan="1">A.M.A. American Journal of Diseases of Children</so>
    <ti>Acetylsalicylic acid poisoning with recovery.</ti>
    <ui>20267726</ui>
    <yr>1947</yr>
  </citation>
  <citation>
    <_index>
      <uri href="REDACTED">2</uri>
    </_index>
    <au>BASS MH</au>
    <so>Journal of the Mount Sinai Hospital, New York</so>
    <ti>Aspirin poisoning in infants.</ti>
    <ui>20265054</ui>
    <yr>1947</yr>
  </citation>
</results>
于 2013-05-08T13:44:14.953 回答