我在 XSSF 分配 1GB 堆的 800,000 个单元和 3M 字符时遇到了同样的问题!
我使用 Pythonopenpyxl
和numpy
读取 xlsx 文件(来自 Java 代码)并首先将其转换为普通文本。然后我在java中加载了文本文件。它可能看起来有很大的开销,但它确实很快。
python脚本看起来像
import openpyxl as px
import numpy as np
# xlsx file is given through command line foo.xlsx
fname = sys.argv[1]
W = px.load_workbook(fname, read_only = True)
p = W.get_sheet_by_name(name = 'Sheet1')
a=[]
# number of rows and columns
m = p.max_row
n = p.max_column
for row in p.iter_rows():
for k in row:
a.append(k.value)
# convert list a to matrix (for example maxRows*maxColumns)
aa= np.resize(a, [m, n])
# output file is also given in the command line foo.txt
oname = sys.argv[2]
print (oname)
file = open(oname,"w")
mm = m-1
for i in range(mm):
for j in range(n):
file.write( "%s " %aa[i,j] )
file.write ("\n")
# to prevent extra newline in the text file
for j in range(n):
file.write("%s " %aa[m-1,j])
file.close()
然后在我的java代码中,我写了
try {
// `pwd`\python_script foo.xlsx foo.txt
String pythonScript = System.getProperty("user.dir") + "\\exread.py ";
String cmdline = "python " + pythonScript +
workingDirectoryPath + "\\" + fullFileName + " " +
workingDirectoryPath + "\\" + shortFileName + ".txt";
Process p = Runtime.getRuntime().exec(cmdline);
int exitCode = p.waitFor();
if (exitCode != 0) {
throw new IOException("Python command exited with " + exitCode);
}
} catch (IOException e) {
System.out.println( e.getMessage() );
} catch (InterruptedException e) {
ReadInfo.append(e.getMessage() );
}
之后,您将获得与 foo.xlsx 类似的 foo.txt,但为文本格式。