为了不在注释中突出显示,而是在实际页面内容流中突出显示,必须将图形突击队放入页面内容流中,在org.pdfclown.samples.cli.TextHighlightSample
示例的情况下,这些命令被隐式放入正常的注释外观流中。
这可以像这样实现:
org.pdfclown.files.File file = new org.pdfclown.files.File(resource);
Pattern pattern = Pattern.compile("S", Pattern.CASE_INSENSITIVE);
TextExtractor textExtractor = new TextExtractor(true, true);
for (final Page page : file.getDocument().getPages())
{
final List<Quad> highlightQuads = new ArrayList<Quad>();
Map<Rectangle2D, List<ITextString>> textStrings = textExtractor.extract(page);
final Matcher matcher = pattern.matcher(TextExtractor.toString(textStrings));
textExtractor.filter(textStrings, new TextExtractor.IIntervalFilter()
{
@Override
public boolean hasNext()
{
return matcher.find();
}
@Override
public Interval<Integer> next()
{
return new Interval<Integer>(matcher.start(), matcher.end());
}
@Override
public void process(Interval<Integer> interval, ITextString match)
{
{
Rectangle2D textBox = null;
for (TextChar textChar : match.getTextChars())
{
Rectangle2D textCharBox = textChar.getBox();
if (textBox == null)
{
textBox = (Rectangle2D) textCharBox.clone();
}
else
{
if (textCharBox.getY() > textBox.getMaxY())
{
highlightQuads.add(Quad.get(textBox));
textBox = (Rectangle2D) textCharBox.clone();
}
else
{
textBox.add(textCharBox);
}
}
}
highlightQuads.add(Quad.get(textBox));
}
}
@Override
public void remove()
{
throw new UnsupportedOperationException();
}
});
// Highlight the text pattern match!
ExtGState defaultExtGState = new ExtGState(file.getDocument());
defaultExtGState.setAlphaShape(false);
defaultExtGState.setBlendMode(Arrays.asList(BlendModeEnum.Multiply));
PrimitiveComposer composer = new PrimitiveComposer(page);
composer.getScanner().moveEnd();
// TODO: reset graphics state here.
composer.applyState(defaultExtGState);
composer.setFillColor(new DeviceRGBColor(1, 1, 0));
{
for (Quad markupBox : highlightQuads)
{
Point2D[] points = markupBox.getPoints();
double markupBoxHeight = points[3].getY() - points[0].getY();
double markupBoxMargin = markupBoxHeight * .25;
composer.drawCurve(new Point2D.Double(points[3].getX(), points[3].getY()),
new Point2D.Double(points[0].getX(), points[0].getY()),
new Point2D.Double(points[3].getX() - markupBoxMargin, points[3].getY() - markupBoxMargin),
new Point2D.Double(points[0].getX() - markupBoxMargin, points[0].getY() + markupBoxMargin));
composer.drawLine(new Point2D.Double(points[1].getX(), points[1].getY()));
composer.drawCurve(new Point2D.Double(points[2].getX(), points[2].getY()),
new Point2D.Double(points[1].getX() + markupBoxMargin, points[1].getY() + markupBoxMargin),
new Point2D.Double(points[2].getX() + markupBoxMargin, points[2].getY() - markupBoxMargin));
composer.fill();
}
}
composer.flush();
}
file.save(new File(RESULT_FOLDER, "multiPage-highlight-content.pdf"), SerializationModeEnum.Incremental);
(HighlightInContent.java方法 testHighlightInContent)
您将从原始示例中识别出文本提取框架。现在只是在处理整个页面的四边形之前收集它们,并且处理代码(大部分是从 中借用的TextMarkup.refreshAppearance()
)将代表四边形的表格绘制到页面内容中。
请注意,要使其正常工作,必须在插入新指令之前重置图形状态(该位置标有TODO
注释)。这可以通过应用保存/恢复状态或通过实际抵消不需要的更改状态条目来完成。不幸的是,我没有看到如何在 PDF Clown 中做前者,也没有时间做后者。