1

我有一个大字文档(超过 10,000 行),其中包含必须使用 Java 将其转换为 excel 的信息表。我正在使用 apache poi 来提取表格并将其保存到 excel 中。我有以下代码,它在 iMac 上的行子集上运行。但是,在完整文档上运行代码时出现堆空间异常:

public class WordExtractor {
  public static void main(String[] args) {
    try {
      File inputFile = new File("table.docx");
      POITextExtractor extractor = ExtractorFactory.createExtractor(inputFile);

      String text = extractor.getText();
      BufferedReader reader = new BufferedReader(new StringReader(text));
      String line = null;
      boolean breakRead = false;
      int rowCount = 0;
      HSSFWorkbook workbook = new HSSFWorkbook();
      HSSFSheet sheet = workbook.createSheet("sheet1");
      while (!breakRead) {
        line = reader.readLine();
        if (line != null) {
          Row row = sheet.createRow(rowCount);
          StringTokenizer st = new StringTokenizer(line, "\t");
          int cellnum = 0;
          while (st.hasMoreTokens()) {
            Cell cell = row.createCell(cellnum++);
            String token = st.nextToken();
            System.out.println(" = " + token);
            cell.setCellValue(token);
          }
        } else {
          breakRead = true;
        }
        rowCount++;
      }

       try {
         FileOutputStream out =
         new FileOutputStream(new File("new.xls"));
         workbook.write(out);
         out.close();
       } catch (FileNotFoundException e) {
       e.printStackTrace();
       } catch (IOException e) {
       e.printStackTrace();
       }
    } catch (Exception ex) {
      ex.printStackTrace();
    }
  }
}
4

1 回答 1

1

感谢评论中的建议,我能够通过删除每一行不必要的 String 对象创建来解决这个问题。System.gc()无论如何,我都可以通过将主while循环放在末尾来解决这个问题。此外,我更新了 VM 参数,为应用程序提供更多运行时内存。我使用了以下设置:-d64 -Xms512m -Xmx4g. 最后,我在创建 Excel 之前明确关闭了提取器和文件读取器对象。

这是更新的代码:

public class WordExtractor {
  public static void main(String[] args) {
    try {
      File inputFile = new File("table.docx");
      POITextExtractor extractor = ExtractorFactory.createExtractor(inputFile);
      String text = extractor.getText();
      BufferedReader reader = new BufferedReader(new StringReader(text));
      String line = null;
      boolean breakRead = false;
      int rowCount = 0;
      HSSFWorkbook workbook = new HSSFWorkbook();
      HSSFSheet sheet = workbook.createSheet("sheet1");
      while (!breakRead) {
        line = reader.readLine();
        if (line != null) {
          Row row = sheet.createRow(rowCount);
          StringTokenizer st = new StringTokenizer(line, "\t");
          int cellnum = 0;
          while (st.hasMoreTokens()) {
            Cell cell = row.createCell(cellnum++);
            String token = st.nextToken();
            cell.setCellValue(token);
          }
        } else {
          breakRead = true;
        }
        rowCount++;
        if (rowCount % 100 == 0) {
          // breakRead = true;
          System.gc();
        }
      }
      reader.close();
      extractor.close();
      System.gc();
       try {
       FileOutputStream out =
       new FileOutputStream(new File("new.xls"));
       workbook.write(out);
       out.close();
       System.out.println("Excel written successfully..");

       } catch (FileNotFoundException e) {
       e.printStackTrace();
       } catch (IOException e) {
       e.printStackTrace();
       }
    } catch (Exception ex) {
      ex.printStackTrace();
    }
  }
}
于 2013-10-28T16:07:04.067 回答