flying-saucer - 飞碟不阅读样式表

Question

我在 asp.net 应用程序（使用 IKVM）中使用 flysaucer 和 iText 将 HTML 转换为 PDF。如果我将样式直接放在 html 中，它可以正常工作（即使样式放在样式标签之间），但是当我链接样式表时，它不会注意到它并生成没有样式的 pdf。

为什么会发生这种情况？

这是我正在使用的代码

        Dim renderer As New ITextRenderer
        Dim buf As New StringBuffer
        buf.append(HTML)
        Dim builder As DocumentBuilder = DocumentBuilderFactory.newInstance.newDocumentBuilder()
        Dim doc As Document = builder.parse(New StringBufferInputStream(buf.toString))

        renderer.setDocument(doc, Nothing)
        renderer.layout()

        renderer.createPDF(os)

这是样式表的链接

<link rel="stylesheet" href="stylemove.css" type="text/css"  />

score 3 · Accepted Answer

从常见问题解答：

我的 PDF 没有拾取我的 CSS！

PDF 被视为“打印”媒体；请参阅有关媒体类型的 CSS 2.1 规范部分。确保在链接或嵌入 CSS 时指定了媒体类型；使用类型“打印”或“全部”。

score 2 · Accepted Answer

如果您使用的是 https，那么飞碟将无法读取 .css 文件，直到您使 java 的密钥库包含您的 Web 服务器的证书。

我也有同样的问题......看这个讨论

https://code.google.com/p/jmesa/issues/detail?id=182

如果你用其他方法解决了，请告诉我！！！

谢谢。

score 1 · Accepted Answer

简单的解决方案：

如果您想快速测试一下您的文档是否适用于您的样式（无需编写大量代码将其集成到您的应用程序中）。只需将您需要的 CSS 复制并粘贴到您的页面中即可。

更多工作解决方案

我的解决方案是读取 CSS 并使用预处理器将其放入 html 中。由于它是一个较旧的应用程序，可能不完全兼容 xhtml，我使用 JSoup 来加载 html。下面的代码进行预处理。我会说这些是帮助您入门的代码片段。伟大的事情是，一旦你开始工作，你可以将服务器上的任何页面转换为 PDF，而无需任何额外的代码。在我的情况下，我设置了一个过滤器来查找特定参数。如果该参数存在，我将使用请求包装器包装请求，以访问 html 页面的最终呈现字节。然后我使用 Jsoup 对其进行解析，然后对其进行预处理。

/**this method uses JSOUP  Document here is org.jsoup.nodes.Document
*/
 @Override
     public void modifyDOM(MyResourceResolver resources, Document normalizedDOM) {

         //move style into head section
         Elements styleTags = normalizedDOM.getElementsByTag("style");

         normalizedDOM.select("style").remove();

         for (org.jsoup.nodes.Element linkElement : styleTags) {

             String curHead = normalizedDOM.head().html();
             normalizedDOM.head().html(curHead + "\n" + linkElement.html() + "\n");


         }


         //now resolve css
         Elements links = normalizedDOM.getElementsByTag("link");


         for (org.jsoup.nodes.Element linkElement : links) {

             String linkHref = linkElement.attr("href");
             if (linkHref == null) {
                 linkHref = "";
             }


             String mediaSelector = linkElement.attr("media");
             if (mediaSelector == null) {
                 mediaSelector = "";
             }
             mediaSelector = mediaSelector.trim();
             if ("".equalsIgnoreCase(mediaSelector) || ("print".equalsIgnoreCase(mediaSelector))) {

                 byte[] contents = resources.getContentsOfHref(linkHref);

                 if (null != contents) {
                     //we've got the info let's add to the document as is
                     Tag styleTag = Tag.valueOf("style");
                     Attributes styleAttributes = new Attributes();
                     styleAttributes.put("type", "text/css");
                     String baseUri = "";
                     org.jsoup.nodes.Element styleElement = new Element(styleTag, baseUri, styleAttributes);
                     styleElement.text(new String(contents));
                     String curHead = normalizedDOM.head().html();
                     normalizedDOM.head().html(curHead + "\n<style type='text/css'>" + styleElement.html() + "</style>\n");

                 }
             }


         }


         normalizedDOM.select("link").remove();
         normalizedDOM.select("script").remove();
     }

由于我要插入 css 并且飞碟不支持 javascript，因此我只是在预处理结束时从文档中删除这些引用。MyResourceResolver类只是我编写的一个类，它引用了 servlet 上下文。实际从服务器读取 css 字节的方法如下所示：

 public byte[] getContentsOfHref(String href) {
         byte[] retval = null;
         byte[] buf = new byte[8195];
         int nread;
         ByteArrayOutputStream bos = new ByteArrayOutputStream();

         InputStream is = null;
         try {

             if (href.startsWith("/myurlcontext")) {
                 href = href.substring("/myurlcontext".length());
             }
                 is = context.getResourceAsStream(href);
                 while ((nread = is.read(buf)) >= 0) {
                     bos.write(buf, 0, nread);
                 }
                 retval = bos.toByteArray();


         } catch (Exception ex) {
             //do nothing for now
         } finally {
             try {
                 is.close();
             } catch (Exception ex) {/*do nothing*/}
         }
         if (retval == null) {
             System.out.println("Did not find: " + href);
         } else {
             System.out.println("Found: " + href + " " + retval.length + " bytes");
         }
         return retval;
     }

下一个问题，如何初始化 JSOUP Dom。好吧，我在一个请求包装器中执行此操作，该包装器读取呈现的 JSP 页面的内容并将其传递给我的 PDF 生成代码：

 String renderedJSPString = new String(renderedJSP);
 //these escape sequences are nuisance in xhtml.
         renderedJSPString = renderedJSPString.replaceAll("&nbsp;|&copy;|&amp;|&lt;|&gt;", "");
         org.jsoup.nodes.Document parsedHtmlDOM = Jsoup.parse(renderedJSPString);
         org.jsoup.nodes.Document normalizedDOM = parsedHtmlDOM.normalise();
         normalizedDOM.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
         normalizedDOM.outputSettings().prettyPrint(true);
 ...
 preProcessor.modifyDOM(resolver, normalizedDOM);
 ...

score 0 · Accepted Answer

您没有在 setDocument 调用中设置文档的基本 URL。正如我所发现的，Flying Saucer 需要它来解析 CSS 和图像链接。有关更多详细信息，请参阅此答案。

flying-saucer - 飞碟不阅读样式表

4 回答 4

Related

Reference