在使用 xmlworker 将 HTML 转换为 PDF 的过程中,我遇到了两个问题:
问题 1:标题标签的样式不符合预期,例如,h1 标签内容文本字体大小和粗细不受封闭标签的影响。同样的事情也适用于其他标题标签(h1-h6),尽管它们被 PDF 识别和添加书签。
问题 2:如果图像包含在 div 标签内,则不显示图像我正在尝试设置解析图像的对齐属性。但是,当我在 ImageProvider 中手动执行此操作时,对齐不会反映在 PDF 文档中。当我创建自己的 TagProcessor 时,图像在 div 内时不显示。当我将父标签从 div 更改为 p(段落)时,图像显示完美,并且对齐工作正常,包括 textwrap。这是我的代码。
public class PDFCreator {
public static void main(String[] args) {
try {
PDFCreator.generatePDF();
} catch (Exception i1) {
i1.printStackTrace();
}
}
private static void generatePDF() throws DocumentException,
FileNotFoundException, BadElementException, MalformedURLException,
IOException {
OutputStream output = new FileOutputStream("V."+new SimpleDateFormat("yyyyMMdd_HHmmss").format(Calendar.getInstance().getTime())+".pdf");
// step 1
Document document = new Document(PageSize.A3, 30, 30, 60, 100);
// step 2
PdfWriter writer = PdfWriter.getInstance(document, output);
writer.setTagged();
document.open();
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new ByteArrayInputStream(""
.getBytes()));
cssResolver.addCss(cssFile);
// HTML
MyHtmlPipelineContext htmlContext = new MyHtmlPipelineContext();
//htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
TagProcessorFactory factory = Tags.getHtmlTagProcessorFactory();
factory.removeProcessor(HTML.Tag.IMG);
factory.addProcessor(new ImageTagProcessor(), HTML.Tag.IMG);
htmlContext.setTagFactory(factory);
htmlContext.setImageProvider(new Base64ImageProvider());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream("page02.html"));
// step 5
document.close();
}
}
public class MyHtmlPipelineContext extends HtmlPipelineContext {
public MyHtmlPipelineContext() {
super(null);
}
public HtmlPipelineContext clone() {
HtmlPipelineContext ctx = null;
try {
ctx = super.clone();
ctx.setImageProvider(new Base64ImageProvider());
} catch (Exception e) {
// handle
}
return ctx;
}
}
public class ImageTagProcessor extends com.itextpdf.tool.xml.html.Image {
public List<Element> end(final WorkerContext ctx, final Tag tag, final List<Element> currentContent) {
List<Element> list = new ArrayList<Element>(1);
list.add(getImageObject(ctx, tag));
return list;
}
public static Image getImageObject(WorkerContext ctx, Tag tag) {
Map<String, String> tagAttributes = tag.getAttributes();
Map<String, String> tagCss = tag.getCSS();
Image imgObj = null;
try {
String heightAttribute;
String widthAtrribute;
String src = (String)tagAttributes.get("src");
int pos = src.indexOf("base64,");
int height = 0;
int width = 0;
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode((String)src.substring(pos + 7));
imgObj = Image.getInstance((byte[])img);
} else {
imgObj = Image.getInstance((String)src);
}
String floatValue = (String)tagCss.get("float");
if (floatValue != null) {
if (floatValue.equalsIgnoreCase("right")) {
imgObj.setAlignment(Image.RIGHT | Image.TEXTWRAP);
} else if (floatValue.equalsIgnoreCase("left")) {
imgObj.setAlignment(Image.LEFT | Image.TEXTWRAP);
}
}
if ((widthAtrribute = (String)tagAttributes.get("width")) != null && widthAtrribute.trim().length() > 0) {
try {
width = Integer.parseInt(widthAtrribute);
}
catch (NumberFormatException var11_13) {
// empty catch block
}
}
if ((heightAttribute = (String)tagAttributes.get("height")) != null && heightAttribute.trim().length() > 0) {
try {
height = Integer.parseInt(heightAttribute);
}
catch (NumberFormatException var12_15) {
// empty catch block
}
}
if (width > 0 && height > 0) {
imgObj.scaleAbsolute((float)width, (float)height);
}
return imgObj;
}
catch (BadElementException ex) {
return null;
}
catch (IOException ex) {
return null;
}
}
}
public class Base64ImageProvider extends AbstractImageProvider {
public Image retrieve(String src) {
int pos = src.indexOf("base64,");
try {
Image imgObj = null;
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode(src.substring(pos + 7));
imgObj = Image.getInstance(img);
}
else {
imgObj = Image.getInstance(src);
}
super.store(src, imgObj);
return imgObj;
} catch (BadElementException ex) {
return null;
} catch (IOException ex) {
return null;
}
}
public String getImageRootPath() {
return null;
}
}
page02.html
<html>
<body ><h1>hello</h1>
<div style="font-size: medium;">
<img align="right"
src="path"
style="width: 267px; height: 200px; float: left;" /></p>
</body>
</html>
有没有我错过的配置?如果图像在段落标签内但不在 div 标签内,为什么图像会正确显示?我必须在哪里修改代码才能使其正常工作?注意:如果我使用默认的 tagProcessor,图像显示正确,但没有对齐或文本换行。