6

我正在开发一种工具来帮助用户编写本质上与 JSP 文件相似的 XHTML-ish 文档。这些文档是 XML,可以包含 XHTML 名称空间中的任何格式正确的标记,并且在它们之间编织的是来自我的产品名称空间的元素。除其他外,该工具使用 XSD 验证输入。

示例输入:

<?xml version="1.0"?>
<markup>
  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
    <c:section>
      <c:paragraph>
        <span>This is a test!</span>
        <a href="http://www.google.com/">click here for more!</a>
      </c:paragraph>
    </c:section>
  </html>
</markup>

我的问题是 XSD 验证的行为不一致,具体取决于我嵌套元素的深度。我想要的是https://my_tag_lib.example.com/根据模式检查命名空间中的所有元素,同时http://www.w3.org/1999/xhtml自由容忍命名空间中的任何元素。我不想列出我的 XSD 中允许的所有 HTML 元素 - 用户可能希望使用仅在某些浏览器等上可用的晦涩元素。相反,我只想将属于命名空间的任何元素列入白名单,使用<xs:any>.

我发现在某些情况下,属于my_tag_lib命名空间但未出现在架构中的元素正在通过验证,而确实出现在架构中的其他元素可以通过为它们提供无效属性而失败。

所以: * 有效元素根据 XSD 架构进行验证 * 无效元素被验证器跳过?

例如,这通过了验证:

<?xml version="1.0"?>
<markup>
  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
    <c:section>
      <div>
        <c:my-invalid-element>This is a test</c:my-invalid-element>
      </div>
    </c:section>
  </html>
</markup>

但随后验证失败:

<?xml version="1.0"?>
<markup>
  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
    <c:section>
      <div>
        <c:paragraph my-invalid-attr="true">This is a test</c:paragraph>
      </div>
    </c:section>
  </html>
</markup>

为什么要针对已识别元素的架构验证属性,而无法识别的元素似乎根本没有得到清理?这里的逻辑是什么?我一直xmllint用来做验证:

xmllint --schema markup.xsd example.xml

这是我的 XSD 文件:

文件:markup.xsd

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <xs:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="html.xsd" />
  <xs:element name="markup">
    <xs:complexType mixed="true">
      <xs:sequence>
        <xs:element ref="xhtml:html" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

文件:html.xsd

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/1999/xhtml">
  <xs:import namespace="https://my_tag_lib.example.com/" schemaLocation="my_tag_lib.xsd" />
  <xs:element name="html">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:any processContents="lax" namespace="http://www.w3.org/1999/xhtml" />
        <xs:any processContents="strict" namespace="https://my_tag_lib.example.com/" />
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

文件:my_tag_lib.xsd

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="https://my_tag_lib.example.com/">
  <xs:element name="section">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:any processContents="lax" namespace="http://www.w3.org/1999/xhtml" />
        <xs:any processContents="strict" namespace="https://my_tag_lib.example.com/" />
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="paragraph">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:any processContents="lax" namespace="http://www.w3.org/1999/xhtml" />
        <xs:any processContents="strict" namespace="https://my_tag_lib.example.com/" />
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>
4

2 回答 2

2

What you're missing is understanding of the context determined declaration.

First, have a look at this little experiment.

<?xml version="1.0"?>
<markup>
    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
        <c:section>
            <div>
                <html>
                    <c:my-invalid-element>This is a test</c:my-invalid-element>
                </html>
            </div>
        </c:section>
    </html>
</markup>

This is the same as your valid example, except that now I've changed the context in which c:my-invalid-element is being assessed from "lax" to "strict". This is done by interjecting the html element, which now forces all the elements in your tag namespace to be strict. As you can easily confirm, the above is invalid.

This tells you (without reading the documentation) that in your examples, the determined context must have been "lax" as opposed to your expectation, which is "strict".

Why is the context lax? div is processed "laxly" (it matches the wildcard, but no definition exists for it), hence it's children will be assessed laxly. Matching with what lax means: in the first case, a definition for c:my-invalid-element was not found, therefore the instruction given is don't worry if you can't - all good. In the invalid sample, a definition for c:paragraph can be found, hence it must be ·valid· with respect to that definition - not good, because of the unexpected attribute.

于 2014-04-03T17:56:14.993 回答
1

div元素未声明,因此如果不接受架构中的无效类型,并且该paragraph元素不允许my-invalid-attr.

也许一些例子可能会使这一点更清楚。

如果元素已声明(例如html, section, paragraph)并且其内容来自 taglib 命名空间(您声明为 have processContents="strict"),则它们将被视为strict。这意味着必须声明属性或子元素。这应该验证失败:

<html>
    <c:my-invalid-element>This is a test</c:my-invalid-element>
</html>

这也会:

<c:section>
    <c:my-invalid-element>This is a test</c:my-invalid-element>
</c:section>

这个:

<div>
    <c:paragraph>
         <c:my-invalid-element>This is a test<c:my-invalid-element>
    </c:paragraph>
</div>

这(因为属性是内容的一部分):

<c:paragraph my-invalid-attr="true">This is a test</c:paragraph>

但是如果元素没有被声明(比如div),它会匹配xs:any声明。没有声明限制 的内容div,因此它允许任何内容。所以这应该通过验证:

<div>
    <c:my-invalid-element>This is a test</c:my-invalid-element>
</div>

并且由于c:my-invalid-element也没有声明,它将允许任何内容或属性。这是有效的:

<div>
    <c:my-invalid-element invalid-attribute="hi"> <!-- VALID -->
        <c:invalid></c:invalid>
        <html></html>
    </c:my-invalid-element>
</div>

但是,如果您将无效元素放在其中一个,html它将失败:

<div>
    <c:my-invalid-element invalid-attribute="hi">
        <html><c:invalid></c:invalid></html>  <!-- NOT VALID -->
    </c:my-invalid-element>
</div>

如果您在已声明的元素(将不匹配xs:any)中使用未声明的属性,无论您的嵌套有多深,都会发生同样的情况:

<div>
    <c:my-invalid-element invalid-attribute="hi"> <!-- VALID -->
        <c:invalid>
            <b> 
                <c:section bad-attribute="boo"></c:section> <!-- FAILS! -->
 ...
于 2014-04-01T17:44:41.733 回答