xml - 报告相应架构的所有未使用元素（+属性）

Question

在这个话题中，我想问的脑细胞比我自己能提供的要多得多。我想根据真实 XML 实例中使用/未使用的元素（仅限单个命名空间）重构我的 XSD（v1.0）。让我们建立一个小场景：

我只有针对相应模式的有效 XML 文档：

<body>
    <h1>Heading 1</h1>
    <p>paragraph</p>
    <p><bold>bold</bold>paragraph<italic>italic</italic></p>
</body>

XSD 验证：

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="body">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element ref="h1"/>
        <xs:element ref="h2"/>
        <xs:element ref="p"/>
        <xs:element ref="span"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="h1" type="xs:string"/>
  <xs:element name="h2" type="xs:string"/>
  <xs:element name="p">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="bold"/>
        <xs:element ref="italic"/>
        <xs:element ref="underline"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="span">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="bold"/>
        <xs:element ref="italic"/>
        <xs:element ref="underline"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="bold" type="xs:NCName"/>
  <xs:element name="italic" type="xs:NCName"/>
  <xs:element name="underline" type="xs:NCName"/>
</xs:schema>

在此基础上，我想创建一份报告（通过 XSLT [2.0、3.0 通过 SAXON EE 9.6.0.5 可用]）关于哪些元素（标签 + 属性）不是使用但在我的 XSD 中可能使用。

简化的伪待办事项/从头开始：

在我的 XSD 中搜索所有//xs:element[@name]（属性跟随在报告 v2.0 中）。
*在我的 XML 中搜索所有内容
“比较”它

问题：

在可爱的 XSLT 社区中，关于这个主题是否已经超出了我的视野？

如何以良好的方式存储和比较它？

xsl:map通过 XSLT 3.0使用？存储路径 [ /body/h1, /body/p] 并比较这些路径？（棘手：从模式中获取正确的路径，处理所有定义方式，例如xs:group ref="..."或通过complexTypes等）

[插件：也许我必须将它扩展到我的 XML 中祖先元素的上下文。在示例情况下，我可能想区分//p/underline和//span/underline。]

<xsl:message>please write your thoughts open minded. I don't request for fully functional code!</xsl:message>

score 1 · Accepted Answer

我为 XSLT 3.0 测试套件做了一个类似这样的练习。您可以在此处找到样式表：

https://dvcs.w3.org/hg/xslt30-test/file/24e8b98b044b/tests/misc/catalog/catalog-007.xsl

它需要两个输入：

(a) 使用带有 -scmout 选项的 com.saxonica.Validate 生成的 SCM 文件，应用于 schema-for-xslt30。SCM 文件是已编译模式的表示，从 XSLT 分析它比原始源模式更容易

(b) 测试套件中的一组非错误样式表，通过递归搜索测试元数据目录获得。

它提取模式允许的元素名称/属性名称对的集合，然后提取样式表中实际存在的元素名称/属性名称对的集合（在每种情况下都经过过滤，例如只考虑XSLT 命名空间）。然后它比较这两个列表，并报告模式允许但不存在于测试样式表中的任何对，以及存在于测试样式表中但模式不允许的任何对。只有当两个列表都为空时，测试才会通过。

score 1 · Accepted Answer

查看http://saxonica.com/html/documentation/functions/saxon/type.html和http://saxonica.com/html/documentation/functions/saxon/schema.html以获取 Saxon EE 中的模式类型信息您的节点，希望这是一种将您的实例与架构进行比较的方法。我从来没有使用过，所以我不确定你能走多远，我相信如果你在你的问题中添加撒克逊语，那么迈克尔凯会在适当的时候给你一些更好的见解。

score 1 · Accepted Answer

您的帖子让我想起了 com.saxonica.Validate 命令上的一个选项：通过指定 -stats:report.xml 您应该获得一份关于实例文档中模式组件使用情况的报告。它似乎在 9.7 中不起作用（并且我对此提出了一个错误），但是在 9.5 中，您会收到以下表单的报告：

<schemaCoverage>
   <component kind="element" namespace="" name="PUB-DATE" count="6"/>
   <component kind="complexType" namespace="" name="weightType" count="6"/>
   <component kind="element" namespace="" name="PUBLISHER" count="6"/>
   <component kind="element" namespace="" name="AUTHOR" count="6"/>
   <component kind="element" namespace="" name="DIMENSIONS" count="6"/>
   <component kind="simpleType" namespace="" name="languageType" count="6"/>
   <component kind="element" namespace="" name="QUANTITY" count="6"/>
   <component kind="element" namespace="" name="CATEGORY" count="3"/>
   <component kind="complexType"
              namespace="http://ns.saxonica.com/anonymous-type"
              name="CATEGORIES_anonymous_type_1_at_line_23_of_books.xsd"
              count="1"/>
   <component kind="element" namespace="" name="LANGUAGE" count="6"/>
   <component kind="element" namespace="" name="PAGES" count="6"/>
   <component kind="complexType" namespace="" name="moneyType" count="6"/>
   <component kind="element" namespace="" name="ISBN" count="6"/>
   <component kind="simpleType"
              namespace="http://www.w3.org/2001/XMLSchema"
              name="IDREF"
              count="6"/>
   <component kind="simpleType"
              namespace="http://www.w3.org/2001/XMLSchema"
              name="ID"
              count="3"/>
   <component kind="complexType"
              namespace="http://ns.saxonica.com/anonymous-type"
              name="BOOKS_anonymous_type_1_at_line_14_of_books.xsd"
              count="1"/>
   <component kind="element" namespace="" name="CATEGORIES" count="1"/>
   <component kind="simpleType" namespace="" name="ISBNType" count="6"/>
   <component kind="simpleType"
              namespace="http://www.w3.org/2001/XMLSchema"
              name="string"
              count="22"/>
   <component kind="complexType"
              namespace="http://ns.saxonica.com/anonymous-type"
              name="ITEM_anonymous_type_1_at_line_39_of_books.xsd"
              count="6"/>
   <component kind="simpleType" namespace="" name="weightUnitType" count="6"/>
   <component kind="complexType"
              namespace="http://ns.saxonica.com/anonymous-type"
              name="CATEGORY_anonymous_type_1_at_line_31_of_books.xsd"
              count="3"/>
   <component kind="simpleType"
              namespace="http://www.w3.org/2001/XMLSchema"
              name="date"
              count="6"/>
   <component kind="simpleType"
              namespace="http://www.w3.org/2001/XMLSchema"
              name="integer"
              count="12"/>
   <component kind="element" namespace="" name="TITLE" count="6"/>
   <component kind="element" namespace="" name="PRICE" count="6"/>
   <component kind="element" namespace="" name="WEIGHT" count="6"/>
   <component kind="complexType" namespace="" name="dimensionsType" count="6"/>
   <component kind="element" namespace="" name="ITEM" count="6"/>
   <component kind="simpleType" namespace="" name="lengthUnitType" count="6"/>
   <component kind="element" namespace="" name="BOOKS" count="1"/>
</schemaCoverage>

这似乎正是您正在寻找的。

xml - 报告相应架构的所有未使用元素（+属性）

3 回答 3

Related

Reference