我正忙于对 OpenBravoPOS 进行一些扩展,以阅读我们订购产品的公司的发票。
此发票以 PDF 格式创建。我使用 Itext Library 来阅读特定的订单行。问题是我能够阅读我需要的页面,在一个大字符串中。这个字符串看起来像
LEVERINGSBON 30/06/2012 27828/2012/NL/WebShop Distributeur ID nummer: 15099191 Uw distributeur: Klant Naam: FM Point Marcel Snoeck Adres: Zonnedauw 17 5953MS Reuver Telefoon: +31654317017 E-MAIL: yvonneenmarcel@home.nl Opmerking: - Lp. Rekening Totaal FV/39525/2012/NL vd Wal Sandra 72.00 1 3 x 354 - Luxury Collection 50ml NEW! 72.00 FV/39526/2012/NL Slaats Tim 6.00 2 1 x KR01 - Eye Pencil DECADENCE BLACK 6.00 FV/39527/2012/NL Nabben Britt 44.95 3 3 x E013 - Krachtreiniger 1000ml 24.75 4 2 x E016 -Tapijtreiniger 1000ml 9.20 5 1 x 3 Step Mascara PERFECT BLACK 11.00 FV/39528/2012/NL Nabben Lieke 32.00 6 1 x 192 - Luxury Collection 50ml 21.00 7 1 x 3 Step Mascara PERFECT BLACK 11.00 FV/39529/2012/NL Claessens Patrick 12.40 8 1 x P101 - Peeling VERBENA 12.40 FV/39530/2012/NL Smits Yolanda 56.00 9 1 x E006 - Wasmiddel VIVID COLOURS 1000ml 7.00 10 2 x B023 - Body Lotion 200ml NEW 18.40 11 2 x 023 - Classic Collection 30ml 30.60 FV/39531/2012/NL van Pol-Thijssen Silvia 34.70 12 1 x 110 - Classic Collection 50ml 15.30 13 1 x N003 - Nagellak HOT RED 7.00 14 1 x P103 - Peeling CHERRY BLOSSOM 12.40 Aantal: 21 Totaal: 258.05 € 1.17.4564.29482 1/1 "
我试图做的是读取每一行,并确定这是否是订单行,如果是,我需要将其放入数据库中。
一个订单行看起来像
2 1 x KR01 - Eye Pencil DECADENCE BLACK 6.00
您可以阅读如下;订单第 2 行,第 1 件产品 KR01 描述 Eye Pencil Decadence Black,价格为 6.00
有没有一种简单的方法来读取这个长字符串并将其与正确的订单行分开。
感谢您的回复
到目前为止,我的代码是:
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package part4.chapter15;
import com.itextpdf.text.pdf.PdfArray;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
public class ExtractPageContent {
/** The original PDF that will be parsed. */
public static final String PREFACE = "C:/Users/marcel/Documents/FM/NL/FMPoint /Kassa_voorraad_software/PDF-Itext/PDF_Results_Import_Files/small.pdf" ;
/** The resulting text file. */
public static final String RESULT = "C:/Users/marcel/Documents/FM/NL/FMPoint /Kassa_voorraad_software/PDF-Itext/PDF_Results_Import_Files/sample- result.txt" ;
/**
* Parses a PDF to a plain text file.
* @param pdf the original PDF
* @param txt the resulting text
* @throws IOException
*/
public void parsePdf(String pdf, String txt) throws IOException {
/** Putting result in Array, to be able extract to Table */
PdfArray array;
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PrintWriter out = new PrintWriter(new FileOutputStream(txt));
TextExtractionStrategy strategy;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
String str = strategy.getResultantText();
CharSequence FindPage = "Lp. Rekening Totaal";
if (str.contains(FindPage)){
out.println(strategy.getResultantText());
}
}
out.flush();
out.close();
}
/**
* Main method.
* @param args no arguments needed
* @throws IOException
*/
public static void main(String[] args) throws IOException {
new ExtractPageContent().parsePdf(PREFACE, RESULT);
}
}