It's a very important question and i am very interested to get any help of you.
I used PDFBox to create a simple PDF document. That i'am trying to do, is to read the existing document and then re-write the same text into it, and in the same position.
1) Firstly i create a PDF named "Musique.pdf".
2)Read this existing document.
3)extract the text into the document with PDFTextStripper.
3)Find the position of each character in the document (x, y, width, fs, etc. ).
4)create a table that must contain the x and y of each character, for example tabel1 [0]=x1 tabel1[1]=y1 , table1[2]=x2, table1[3]=y2 , etc.
5) Then create a boucle of PDFContentStream to re-write each character in the correct position.
The problem is:
the first line is completely wrote but the problem is with the second line.
"I notice that if we have for example a text formed of 3 lines and if we assume that it contains 225 characters,,so if we get the length of this text, we will put a length equal to 231,,so we can notice that it adds 2 spaces of the end of each line,, but when we search the position of each character, the program does not consider these added spaces"
Please run my below code and tell me how to resolve this problem, please.
My code until now:
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package test;
import java.io.IOException;
import java.io.OutputStream;
import java.util.List;
import org.apache.pdfbox.cos.COSInteger;
import org.apache.pdfbox.cos.COSStream;
import org.apache.pdfbox.cos.COSString;
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdfparser.PDFStreamParser;
import org.apache.pdfbox.pdfwriter.ContentStreamWriter;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.apache.pdfbox.util.PDFOperator;
import org.apache.pdfbox.util.PDFTextStripper;
import org.apache.pdfbox.util.TextPosition;
public class Test extends PDFTextStripper{
private static final String src="...";
private static int i;
private static float[] table1;
private static PDPageContentStream content;
private static float jjj;
public Test() throws IOException {
super.setSortByPosition(true);
}
public static void createPdf(String src) throws IOException, COSVisitorException{
//create document named "Musique.pdf"
PDRectangle rec= new PDRectangle(400,400);
PDDocument document= null;
document= new PDDocument();
PDPage page= new PDPage(rec);
document.addPage(page);
PDFont font= PDType1Font.HELVETICA;
PDPageContentStream canvas1= new PDPageContentStream(document,page,true,true);
canvas1.setFont(font, 10);
canvas1.beginText();
canvas1.appendRawCommands("15 385 Td");
canvas1.appendRawCommands("(La musique est très importante dans notre vie moderne. Sans la musique, non)Tj\n");
canvas1.endText();
canvas1.close();
PDPageContentStream canvas2= new PDPageContentStream(document,page,true,true);
canvas2.setFont(font, 11);
canvas2.beginText();
canvas2.appendRawCommands("15 370 Td");
canvas2.appendRawCommands("(Donc il est très necessaire de jouer chaque jours la musique.)Tj\n");
canvas2.endText();
canvas2.close();
document.save("Musique.pdf");
document.close();
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws IOException, COSVisitorException {
Test tes= new Test();
tes.createPdf(src);
//read the existing document
PDDocument doc;
doc= PDDocument.load("Musique.pdf");
List pages = doc.getDocumentCatalog().getAllPages();
PDPage page = (PDPage) pages.get(0);
//extract the text existed in the document
PDFTextStripper stripper =new PDFTextStripper();
String texte=stripper.getText(doc);
PDStream contents = page.getContents();
if(contents!=null){
i=1;
table1=new float[texte.length()*2];
table1[0]=(float)15.0;
//the function below call the processTextPosition procedure in order to find the position of each character and put each value in a case of table1
tes.processStream(page, page.findResources(), page.getContents().getStream());
//after execution of processTextPosition, the analysing of code continue to the below code:
int iii=0;
int kkk=0;
//create a boucle of PDPageContentStream in order to re-write completly the text in the document
//when you run this code, you must notice a problem with the second line, so how to resolve this problem ?
PDFont font= PDType1Font.HELVETICA;
while(kkk<table1.length){
content = new PDPageContentStream(doc,page,true,true);
content.setFont(font, 10);
content.beginText();
jjj = 400-table1[kkk+1];
content.appendRawCommands(""+table1[kkk]+" "+jjj+" Td");
content.appendRawCommands("("+texte.charAt(iii)+")"+" Tj\n");
content.endText();
content.close();
iii=iii+1;
kkk=kkk+2;
}
}
//save the modified document
doc.save("Modified-musique.pdf");
doc.close();
}
/**
* @param text The text to be processed
*/
public void processTextPosition(TextPosition text) {
System.out.println("String[" + text.getXDirAdj() + ","
+ text.getYDirAdj() + " fs=" + text.getFontSize() + " xscale="
+ text.getXScale() + " height=" + text.getHeightDir() + " space="
+ text.getWidthOfSpace() + " width="
+ text.getWidthDirAdj() + "]" + text.getCharacter());
if(i>1){
table1[i]=text.getXDirAdj();
System.out.println(table1[i]);
i=i+1;
table1[i]=text.getYDirAdj();
System.out.println(table1[i]);
i=i+1;
}
else{
table1[i]=text.getYDirAdj();
System.out.println(table1[i]);
i=i+1;
}
}
}
Best Regards,
Liszt.