I'm a .NET developer, so the code is in C#. But you should be able to easily translate the following.
iText is a PDF-first library, and [X]HTML
parsing is quite complex so it's not full featured in that regard. Whenever parsing [X]HTML
and things aren't going the way you expect for specific tags, the basic steps you should follow are:
- Verify
XML Worker
supports the tag: Tags class.
- If the tag is supported, which in this case is true, take a look at the default implementation. Here it's handled by the the HorizontalRule class. However, we see there's no support for your use case, so one way to go is use that code as a blueprint. (follows below) You can also inherit from the specific tag class and override the End() method as done here. Either way, all you're doing is implementing a custom tag processor.
- If the tag is not supported, you need to roll your own custom tag processor by inheriting from AbstractTagProcessor.
Anyway, here's a simple example to get you started. First, the custom tag processor:
public class CustomHorizontalRule : AbstractTagProcessor
{
public override IList<IElement> Start(IWorkerContext ctx, Tag tag)
{
IList<IElement> result;
LineSeparator lineSeparator;
var cssUtil = CssUtils.GetInstance();
try
{
IList<IElement> list = new List<IElement>();
HtmlPipelineContext htmlPipelineContext = this.GetHtmlPipelineContext(ctx);
Paragraph paragraph = new Paragraph();
IDictionary<string, string> css = tag.CSS;
float baseValue = 12f;
if (css.ContainsKey("font-size"))
{
baseValue = cssUtil.ParsePxInCmMmPcToPt(css["font-size"]);
}
string text;
css.TryGetValue("margin-top", out text);
if (text == null) text = "0.5em";
string text2;
css.TryGetValue("margin-bottom", out text2);
if (text2 == null) text2 = "0.5em";
string border;
css.TryGetValue(CSS.Property.BORDER_BOTTOM_STYLE, out border);
lineSeparator = border != null && border == "dotted"
? new DottedLineSeparator()
: new LineSeparator();
var element = (LineSeparator)this.GetCssAppliers().Apply(
lineSeparator, tag, htmlPipelineContext
);
string color;
css.TryGetValue(CSS.Property.BORDER_BOTTOM_COLOR, out color);
if (color != null)
{
// WebColors deprecated, but docs don't state replacement
element.LineColor = WebColors.GetRGBColor(color);
}
paragraph.SpacingBefore += cssUtil.ParseValueToPt(text, baseValue);
paragraph.SpacingAfter += cssUtil.ParseValueToPt(text2, baseValue);
paragraph.Leading = 0f;
paragraph.Add(element);
list.Add(paragraph);
result = list;
}
catch (NoCustomContextException cause)
{
throw new RuntimeWorkerException(
LocaleMessages.GetInstance().GetMessage("customcontext.404"),
cause
);
}
return result;
}
}
Most of the code is taken directly from the existing source, with the exception of the checks for CSS.Property.BORDER_BOTTOM_STYLE
and CSS.Property.BORDER_BOTTOM_COLOR
to set border style and color if they're inlined in the <hr>
style
attribute.
Then you add the custom tag processor above to the XML Worker TagProcessorFactory
:
using (var stream = new FileStream(OUTPUT_FILE, FileMode.Create))
{
using (var document = new Document())
{
var writer = PdfWriter.GetInstance(document, stream);
document.Open();
var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
// custom tag processor above
tagProcessorFactory.AddProcessor(
new CustomHorizontalRule(),
new string[] { HTML.Tag.HR }
);
var htmlPipelineContext = new HtmlPipelineContext(null);
htmlPipelineContext.SetTagFactory(tagProcessorFactory);
var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
var cssResolverPipeline = new CssResolverPipeline(
cssResolver, htmlPipeline
);
var worker = new XMLWorker(cssResolverPipeline, true);
var parser = new XMLParser(worker);
var xHtml = "<hr style='border:1px dotted red' />";
using (var stringReader = new StringReader(xHtml))
{
parser.Parse(stringReader);
}
}
}
One thing to note is that even though we're using the shorthand border
inline style, iText's CSS parser appears to set all the styles internally. I.e., you can use any of the four longhand styles to check - I just happened to use CSS.Property.BORDER_BOTTOM_STYLE
and CSS.Property.BORDER_BOTTOM_COLOR
.
The resulting PDF: