jsoup - 在 Vaadin 中使用 Jsoup - 不能禁用转义？

Question

我正在尝试修改 Vaadin 发送到浏览器的引导 Javascript。这是有关此问题的 Vaadin 论坛链接： https ://vaadin.com/forum#!/thread/4252604

Vaadin 使用 Jsoup，因此我使用 Jsoup API 在 Vaadin 有效负载中找到正确的位置来修改 Javascript。当我像这样使用 Jsoup API 时：

element.html(newHTML)

newHTML 中的任何内容都会被转义。因此，例如，如果 newHTML 是：

alert("hi");

然后调用 Jsoup API 将导致：

alert(&quot;hi&quot;);

我以为我可以通过执行以下操作来禁用此 Jsoup 转义：

element.ownerDocument().outputSettings().escapeMode(...)

但 ownerDocument() 为空，所以我不认为这是一个选项。Jsoup 是否有解决此限制的方法，以便我可以获得具有双引号 (") 甚至打开/关闭标记括号 (<, >) 的 Javascript 来获取输出？

score 0 · Accepted Answer

我的解决方案是继承 TextNode 并覆盖进行转义的方法。

package org.jsoup.nodes;

public class UnescapedTextNode extends TextNode
{
    public UnescapedTextNode( final String text, final String baseUri )
    {
        super( text, baseUri );
    }

    @Override
    void outerHtmlHead(
        final StringBuilder accum,
        final int depth,
        final Document.OutputSettings out )
    {
        //String html = Entities.escape( getWholeText(), out ); // Don't escape!
        String html = getWholeText();
        if ( out.prettyPrint() &&
             parent() instanceof Element &&
             !Element.preserveWhitespace( parent() ) )
        {
             html = normaliseWhitespace( html );
        }
        if ( out.prettyPrint() &&
             ( ( siblingIndex() == 0 &&
                 parentNode instanceof Element &&
                 ( (Element)parentNode ).tag().formatAsBlock() &&
                   !isBlank() ) ||
                 ( out.outline() &&
                   siblingNodes().size() > 0 &&
                   !isBlank() ) ) )
        {
            indent( accum, depth, out );
        }
        accum.append( html );
    }
}

这几乎是TextNode.outerHtmlHead()（最初由 Jonathan Hedley 撰写）的逐字副本。我刚刚注释掉了转义部分。这就是我使用它的方式：

// ... assuming head is of type Element and refers to the <head> of the document.
final String message = "Hello World!";
final String messageScript = "alert( \"" + message + "\" );";
final Element messageScriptEl = head.appendElement( "script" ).
    attr( "type", "text/javascript" );
final TextNode messageScriptTextNode = new UnescapedTextNode(
    messageScript,
    messageScriptEl.baseUri() );
messageScriptEl.appendChild( messageScriptTextNode );
// ... etc

进一步，调用Document.toString()或Document.outerHtml()生成带有未转义创建的脚本标记内的文本的输出。IE：

<script type="text/javascript">alert( "Hello World!" );</script>

代替：

<script type="text/javascript">alert( &quot;Hello World!&quot; );</script>

就像以前发生的那样。

我发现了两个“陷阱”：

UnescapedTextNode 类需要由加载原始 jsoup 库的同一个类加载器加载。这是因为在上面，我已经覆盖了一个包私有方法，这是 JLS 中规定的。（感谢 Jeff Sinclair的文章向我指出了这一点。相关的一点是
当且仅当以下任一条件为真时，类或接口 D 才能访问字段或方法 R：
- …</li>
- R 是包私有的，由与 D 相同的运行时包中的类声明。
这是在访问控制 (5.4.4)下的JVM 规范中。
这是一件非常冒险的事情，因为您正在有效地切断阻止您将未经处理的数据放入文档的安全网。确保您添加到此文本节点的任何来自应用程序用户的内容不包含 html 标签（尤其是），否则您将在 XSS、CSRF 等方面遇到非常糟糕的情况。

score 0 · Accepted Answer

显然，

element.childNode(0).attr("data", html);

如果element是“脚本”元素并且html是 Javascript 源，则可以解决问题。

jsoup - 在 Vaadin 中使用 Jsoup - 不能禁用转义？

2 回答 2

Related

Reference