java - 是否可以使用 PDFBox 从 PDF 中检索每个字符的字体颜色

Question

是否可以使用 PDFBox 从 PDF 中检索每个字符的字体颜色根据我之前的问题的解决方案：如何使用 PDFBOX java 提取字体颜色？我一直在继续，直到检索下面的字体属性是我的代码片段

    PDDocument doc = null;
    doc = PDDocument.load("C:\\Users\\Desktop\\abc.pdf");
    PDFStreamEngine engine = new PDFStreamEngine(ResourceLoader.loadProperties("org/apache/pdfbox/resources/PageDrawer.properties"));
    PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get(0);
    engine.processStream(page, page.findResources(), page.getContents().getStream());
    PDGraphicsState graphicState = engine.getGraphicsState();
    System.out.println(graphicState.getNonStrokingColor().getJavaColor()); 
    doc.close();

实际 PDF 仅包含以下文本：MessageHi其中 Message 将包含蓝色作为字体颜色并Hi保持绿色，当我执行上面的代码时，它会显示我

java.awt.Color[r=0,g=255,b=0]  ----> green

同样，我尝试使用下面的代码来检索每个字符及其各自的字体属性，但是，在显示颜色时它总是显示java.awt.Color[r=0,g=0,b=0] --- black color

public class PrintTextLocations extends PDFTextStripper {

public PrintTextLocations() throws IOException {
    super.setSortByPosition(true);
}

public static void main(String[] args) throws Exception
{
    PDDocument document = null;
    document = PDDocument.load("C:\\Users\\Desktop\\abc.pdf");
    List allPages = document.getDocumentCatalog().getAllPages();
    PrintTextLocations printer = new PrintTextLocations();
        for (int i = 0; i < allPages.size(); i++) 
        {
            PDPage page = (PDPage) allPages.get(i);
            System.out.println("Processing page: " + i);
            PDStream contents = page.getContents();
            if (contents != null)
            {
                printer.processStream(page, page.findResources(), page.getContents().getStream());              
            }
        }       
        document.close();
    } 
protected void processTextPosition(TextPosition text){      
        try {
            System.out.println("String[" + text.getXDirAdj() + ","
                    + text.getYDirAdj() + " fs=" + text.getFontSize() + " xscale="
                    + text.getXScale() + " height=" + text.getHeightDir()
                    + " space=" + text.getWidthOfSpace() + " width="
                    + text.getWidthDirAdj() + "]" + text.getCharacter() + getGraphicsState().getNonStrokingColor().getJavaColor());
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }                   
}
}

任何人都可以帮我检索每个字符的字体颜色吗？

谢谢

java - 是否可以使用 PDFBox 从 PDF 中检索每个字符的字体颜色

0 回答 0

Related

Reference