我遇到了坐标问题。PDFTextStripperByArea 区域似乎被推得太高了。
考虑以下示例片段:
...
PDPage page = (PDPage) allPages.get(0);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
// define region for extraction -- the coordinates and dimensions are x, y, width, height
Rectangle2D.Float region = new Rectangle2D.Float(x, y, width, height);
stripper.addRegion("test region", region);
// overlay the region with a cyan rectangle to check if I got the coordinates and dimensions right
PDPageContentStream contentStream = new PDPageContentStream(document, page, true, true);
contentStream.setNonStrokingColor( Color.CYAN );
contentStream.fillRect(x, y, width, height );
contentStream.close();
// extract the text from the defined region
stripper.extractRegions(page);
String content = stripper.getTextForRegion("test region");
...
document.save(...); ...
青色矩形很好地覆盖了所需区域。另一方面,stripper 遗漏了矩形底部的几行,并在矩形上方包含了几行——看起来它“向上”移动(按 y 坐标)。到底是怎么回事?