android - 使用 iText 库将 html 转换为 pdf 时未应用 hr 的内联 CSS

Question

我正在使用适用于 android 的 Itext 库将 html 转换为 pdf，这工作正常，但在某些情况下它无法正确解析。我想创建一个红色的虚线分隔符，但它总是给我一个深灰色的实线分隔符。

我的html标签是

<hr noshade style="border: 0; width:100%;border-bottom-width: 1px; border-bottom-style: dotted; border-bottom-color: red">

我的转换代码

        Document document = new Document(PageSize.A4);

        //this sets the margin to the created pdf
        document.setMargins(35, 35, 150, 100);

        PdfWriter writer = PdfWriter.getInstance(document,
                new FileOutputStream(fileWithinMyDir));
        if (isPrescription) {
            HeaderFooterPageEvent event = new HeaderFooterPageEvent();
            writer.setPageEvent(event);
        } else {
            CertificateFooterPageEvent event = new CertificateFooterPageEvent();
            writer.setPageEvent(event);
        }
        document.open();
        HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
        htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
        htmlContext.setImageProvider(new AbstractImageProvider() {
            public String getImageRootPath() {

                Uri uri = Uri.parse("file:///android_asset/");

                return uri.toString();
            }
        });


        CSSResolver cssResolver =
                XMLWorkerHelper.getInstance().getDefaultCssResolver(false);

        // Pipelines
        PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
        HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
        CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

        XMLWorker worker = new XMLWorker(css, true);

        XMLParser p = new XMLParser(worker);
        InputStream is = new ByteArrayInputStream(htmlString.getBytes());
        XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
        p.parse(is);

        document.close();

score 6 · Accepted Answer

I'm a .NET developer, so the code is in C#. But you should be able to easily translate the following.

iText is a PDF-first library, and [X]HTML parsing is quite complex so it's not full featured in that regard. Whenever parsing [X]HTML and things aren't going the way you expect for specific tags, the basic steps you should follow are:

Verify XML Worker supports the tag: Tags class.
If the tag is supported, which in this case is true, take a look at the default implementation. Here it's handled by the the HorizontalRule class. However, we see there's no support for your use case, so one way to go is use that code as a blueprint. (follows below) You can also inherit from the specific tag class and override the End() method as done here. Either way, all you're doing is implementing a custom tag processor.
If the tag is not supported, you need to roll your own custom tag processor by inheriting from AbstractTagProcessor.

Anyway, here's a simple example to get you started. First, the custom tag processor:

public class CustomHorizontalRule : AbstractTagProcessor
{
    public override IList<IElement> Start(IWorkerContext ctx, Tag tag)
    {
        IList<IElement> result;
        LineSeparator lineSeparator;
        var cssUtil = CssUtils.GetInstance();

        try
        {
            IList<IElement> list = new List<IElement>();
            HtmlPipelineContext htmlPipelineContext = this.GetHtmlPipelineContext(ctx);

            Paragraph paragraph = new Paragraph();
            IDictionary<string, string> css = tag.CSS;
            float baseValue = 12f;
            if (css.ContainsKey("font-size"))
            {
                baseValue = cssUtil.ParsePxInCmMmPcToPt(css["font-size"]);
            }
            string text;
            css.TryGetValue("margin-top", out text);
            if (text == null) text = "0.5em";

            string text2;
            css.TryGetValue("margin-bottom", out text2);
            if (text2 == null) text2 = "0.5em";

            string border;
            css.TryGetValue(CSS.Property.BORDER_BOTTOM_STYLE, out border);
            lineSeparator = border != null && border == "dotted"
                ? new DottedLineSeparator()
                : new LineSeparator();

            var element = (LineSeparator)this.GetCssAppliers().Apply(
                lineSeparator, tag, htmlPipelineContext
            );

            string color;
            css.TryGetValue(CSS.Property.BORDER_BOTTOM_COLOR, out color);
            if (color != null)
            {
                // WebColors deprecated, but docs don't state replacement
                element.LineColor = WebColors.GetRGBColor(color);
            }

            paragraph.SpacingBefore += cssUtil.ParseValueToPt(text, baseValue);
            paragraph.SpacingAfter += cssUtil.ParseValueToPt(text2, baseValue);
            paragraph.Leading = 0f;
            paragraph.Add(element);
            list.Add(paragraph);
            result = list;
        }
        catch (NoCustomContextException cause)
        {
            throw new RuntimeWorkerException(
                LocaleMessages.GetInstance().GetMessage("customcontext.404"),
                cause
            );
        }
        return result;
    }
}

Most of the code is taken directly from the existing source, with the exception of the checks for CSS.Property.BORDER_BOTTOM_STYLE and CSS.Property.BORDER_BOTTOM_COLOR to set border style and color if they're inlined in the <hr> style attribute.

Then you add the custom tag processor above to the XML Worker TagProcessorFactory:

using (var stream = new FileStream(OUTPUT_FILE, FileMode.Create))
{
    using (var document = new Document())
    {
        var writer = PdfWriter.GetInstance(document, stream);
        document.Open();

        var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
        // custom tag processor above
        tagProcessorFactory.AddProcessor(
            new CustomHorizontalRule(),
            new string[] { HTML.Tag.HR }
        );
        var htmlPipelineContext = new HtmlPipelineContext(null);
        htmlPipelineContext.SetTagFactory(tagProcessorFactory);

        var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
        var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);
        var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
        var cssResolverPipeline = new CssResolverPipeline(
            cssResolver, htmlPipeline
        );

        var worker = new XMLWorker(cssResolverPipeline, true);
        var parser = new XMLParser(worker);
        var xHtml = "<hr style='border:1px dotted red' />";
        using (var stringReader = new StringReader(xHtml))
        {
            parser.Parse(stringReader);
        }
    }
}

One thing to note is that even though we're using the shorthand border inline style, iText's CSS parser appears to set all the styles internally. I.e., you can use any of the four longhand styles to check - I just happened to use CSS.Property.BORDER_BOTTOM_STYLE and CSS.Property.BORDER_BOTTOM_COLOR.

The resulting PDF:

score 0 · Accepted Answer

您可以使用没有任何内容或任何内容的 div 来代替 hr 并为该 div 提供边框样式，我相信它会适用于您的情况。

android - 使用 iText 库将 html 转换为 pdf 时未应用 hr 的内联 CSS

2 回答 2

Related

Reference