java - 编辑 dotx/docx 文件中的标题

Question

我目前正在尝试从 dotx 格式的现有模板生成新的 docx 文件。我想更改标题中的名字、姓氏等，但由于某种原因我无法访问它们......我的方法如下：

 public void generateDocX(Long id) throws IOException, InvalidFormatException {

    //Get user per id
    EmployeeDTO employeeDTO = employeeService.getEmployee(id);

    //Location where the new docx file will be saved
    FileOutputStream outputStream = new FileOutputStream(new File("/home/user/Documents/project/src/main/files/" + employeeDTO.getId() + "header.docx"));

    //Get the template for generating the new docx file
    File template = new File("/home/user/Documents/project/src/main/files/template.dotx");
    OPCPackage pkg = OPCPackage.open(template);
    XWPFDocument document = new XWPFDocument(pkg);

    for (XWPFHeader header : document.getHeaderList()) {
        List<XWPFParagraph> paragraphs = header.getParagraphs();
        System.out.println("Total paragraphs in header are: " + paragraphs.size());
        System.out.println("Total elements in the header are: " + header.getBodyElements().size());
        for (XWPFParagraph paragraph : paragraphs) {
            System.out.println("Paragraph text is: " + paragraph.getText());
            List<XWPFRun> runs = paragraph.getRuns();
            for (XWPFRun run : runs) {
                String runText = run.getText(run.getTextPosition());
                System.out.println("Run text is: " + runText);
            }
        }
    }

    //Write the changes to the new docx file and close the document
    document.write(outputStream);
    document.close();
}

控制台中的输出是 1、null 或空字符串......我已经尝试了几种方法，从这里、这里和这里，但没有任何运气......

这是 template.dotx 里面的内容

score 3 · Accepted Answer

IBody.getParagraphs和IBody.getBodyElements-仅获取直接在其中的段落或正文元素IBody。但是您的段落并不直接在其中，而是在单独的文本框或文本框架中。这就是为什么他们不能以这种方式得到。

由于*.docx是ZIP包含XML文档、页眉和页脚的 ijg 文件的存档，因此可以IBody通过创建一个XmlCursor选择所有w:r XML元素的文件来获取一个文件的所有文本运行。对于一个XWPFHeader这可能看起来像这样：

 private List<XmlObject> getAllCTRs(XWPFHeader header) {
  CTHdrFtr ctHdrFtr = header._getHdrFtr();
  XmlCursor cursor = ctHdrFtr.newCursor();
  cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
  List<XmlObject> ctrInHdrFtr = new ArrayList<XmlObject>();
  while (cursor.hasNextSelection()) {
   cursor.toNextSelection();
   XmlObject obj = cursor.getObject();
   ctrInHdrFtr.add(obj);
  }
  return ctrInHdrFtr;
 }

现在我们有了XML该标题中所有元素的列表，这些元素是Word.

我们可以有一个更通用getAllCTRs的方法来获取任何类型的所有CTR元素，IBody如下所示：

 private List<XmlObject> getAllCTRs(IBody iBody) {
  XmlCursor cursor = null;
  List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();

  if (iBody instanceof XWPFHeaderFooter) {
   XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
   CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
   cursor = ctHdrFtr.newCursor();
  } else if (iBody instanceof XWPFDocument) {
   XWPFDocument document = (XWPFDocument)iBody;
   CTDocument1 ctDocument1 = document.getDocument();
   cursor = ctDocument1.newCursor();
  } else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
   XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
   CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
   cursor = ctFtnEdn.newCursor();
  }

  if (cursor != null) {
   cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
   while (cursor.hasNextSelection()) {
    cursor.toNextSelection();
    XmlObject obj = cursor.getObject();
    ctrInIBody.add(obj);
   }
  }
  return ctrInIBody ;
 }

现在我们有一个列表中的所有XML元素，IBody这些元素是中的文本运行元素Word。

有了它，我们可以像这样从它们中获取文本：

 private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    System.out.println(text);
   }
  }
 }

这可能表明了下一个挑战。因为Word在创建文本运行元素时非常混乱。例如，您的占位符<<Firstname>>可以拆分为 text- runs <<++ 。原因可能是格式不同或拼写检查或其他原因。甚至这是可能的：+ + + + 。甚至是这样：+ + + 。你看，用文本替换占位符几乎是不可能的，因为占位符可能被分成多个 tex-runs。Firstname>><<Lastname>>; <<YearOfBirth>><<Firstname>> <<Lastname>>; <<YearOfBirth>>

为了避免这种情况，template.dotx需要从知道自己在做什么的用户中创建。

首先关闭拼写检查。语法检查也是如此。如果没有，所有发现的可能的拼写错误或语法违规都在单独的文本运行中以相应地标记它们。

其次确保整个占位符的格式相同。不同格式的文本也必须在单独的文本运行中。

我真的怀疑这是否能正常工作。但是你自己试试。

完整示例：

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;

import java.util.List;
import java.util.ArrayList;

public class WordEditAllIBodys {

 private List<XmlObject> getAllCTRs(IBody iBody) {
  XmlCursor cursor = null;
  List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();

  if (iBody instanceof XWPFHeaderFooter) {
   XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
   CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
   cursor = ctHdrFtr.newCursor();
  } else if (iBody instanceof XWPFDocument) {
   XWPFDocument document = (XWPFDocument)iBody;
   CTDocument1 ctDocument1 = document.getDocument();
   cursor = ctDocument1.newCursor();
  } else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
   XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
   CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
   cursor = ctFtnEdn.newCursor();
  }

  if (cursor != null) {
   cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
   while (cursor.hasNextSelection()) {
    cursor.toNextSelection();
    XmlObject obj = cursor.getObject();
    ctrInIBody.add(obj);
   }
  }
  return ctrInIBody ;
 }

 private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    System.out.println(text);
   }
  }
 }

 private void replaceTextInTextRunsOfIBody(IBody iBody, String placeHolder, String textValue) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    if (text != null && text.contains(placeHolder)) {
     text = text.replace(placeHolder, textValue);
     ctText.setStringValue(text);
     obj.set(ctr);
    }
   }
  }
 }

 public void generateDocX() throws Exception {

  FileOutputStream outputStream = new FileOutputStream(new File("./" + 1234 + "header.docx"));

  //Get the template for generating the new docx file
  File template = new File("./template.dotx");
  XWPFDocument document = new XWPFDocument(new FileInputStream(template));

  //traverse all headers
  for (XWPFHeader header : document.getHeaderList()) {
   printAllTextInTextRunsOfIBody(header);

   replaceTextInTextRunsOfIBody(header, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(header, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(header, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse all footers
  for (XWPFFooter footer : document.getFooterList()) {
   printAllTextInTextRunsOfIBody(footer);

   replaceTextInTextRunsOfIBody(footer, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(footer, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(footer, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse document body; note: tables needs not be traversed separately because they are in document body
  printAllTextInTextRunsOfIBody(document);

  replaceTextInTextRunsOfIBody(document, "<<Firstname>>", "Axel");
  replaceTextInTextRunsOfIBody(document, "<<Lastname>>", "Richter");
  replaceTextInTextRunsOfIBody(document, "<<ProfessionalTitle>>", "Skeptic");

  //traverse all footnotes
  for (XWPFFootnote footnote : document.getFootnotes()) {
   printAllTextInTextRunsOfIBody(footnote);

   replaceTextInTextRunsOfIBody(footnote, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(footnote, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(footnote, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse all endnotes
  for (XWPFEndnote endnote : document.getEndnotes()) {
   printAllTextInTextRunsOfIBody(endnote);

   replaceTextInTextRunsOfIBody(endnote, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(endnote, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(endnote, "<<ProfessionalTitle>>", "Skeptic");
  }  


  //since document was opened from *.dotx the content type needs to be changed
  document.getPackage().replaceContentType(
   "application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
   "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");

  //Write the changes to the new docx file and close the document
  document.write(outputStream);
  outputStream.close();
  document.close();
 }

 public static void main(String[] args) throws Exception {
  WordEditAllIBodys app = new WordEditAllIBodys();
  app.generateDocX();
 }
}

顺便说一句：由于您的文档是从*.dotx内容类型打开的，因此需要将其更改wordprocessingml.template为wordprocessingml.document. 否则 Word 将不会打开生成的*.docx文档。请参阅将具有“.dotx”扩展名（模板）的文件转换为“docx”（Word 文件）。

由于我对替换占位符文本方法持怀疑态度，因此我首选的方法是填写表格。请参阅处理 word 文档 java 的问题。当然，这样的表单字段不能用于页眉或页脚。所以页眉或页脚应该从头开始创建。

java - 编辑 dotx/docx 文件中的标题

1 回答 1

Related

Reference