java - 使用java将大量数据从数据库导出到.csv的问题

Question

我，谢谢你的关注。

我想使用 java 将大量数据，实际上是大量数据（600 万行）导出到 .csv 文件。该应用程序是一个摇摆应用程序，带有 JPA，使用 toplink (ojdbc14)。

我曾尝试使用：

BufferedWriter RandomAccessFile FileChannel

等等，但是内存消耗仍然很高，导致 Java Heap Out of Memory 异常，尽管我将最大堆大小设置为 800m (-Xmx800m)。

我最后一个版本的源代码：

...(more lines of code)

FileChannel channel = getRandomAccessFile(tempFile).getChannel();
Object[][] data = pag.getRawData(); //Database data in a multidimentional array

            for (int j = 0; j < data.length; j++) {
                write(data[j], channel); //write data[j] (an array) into the channel
                freeStringLine(data[j]); //data[j] is an array, this method sets all positions =null
                data[j] = null;//sets reference in null
            }

            channel.force(false); //force writing in file system (HD)
            channel.close(); //Close the channel
            pag = null; 

...(more lines of code)

 private void write(Object[] row, FileChannel channel) throws DatabaseException {
    if (byteBuff == null) {
        byteBuff = ByteBuffer.allocateDirect(1024 * 1024);
    }
    for (int j = 0; j < row.length; j++) {
        if (j < row.length - 1) {
            if (row[j] != null) {
                byteBuff.put(row[j].toString().getBytes());
            }
            byteBuff.put(SPLITER_BYTES);
        } else {
            if (row[j] != null) {
                byteBuff.put(row[j].toString().getBytes());
            }
        }
    }
    byteBuff.put("\n".toString().getBytes());        
    byteBuff.flip();
    try {
        channel.write(byteBuff);
    } catch (IOException ex) {
        throw new DatabaseException("Imposible escribir en archivo temporal de exportación : " + ex.getMessage(), ex.getCause());
    }
    byteBuff.clear();
}

作为 600 万行，我不想在创建文件时将该数据存储在内存中。我制作了许多临时文件（每个文件有 5000 行），并在该过程的最后，使用两个 FileChannel 将所有这些临时文件附加到一个文件中。但是，内存不足的异常是在加入之前启动的。

您现在是否有另一种导出大量数据的策略？

非常感谢任何回答。对不起我的英语，我正在提高 xD

score 3 · Accepted Answer

答案是使用“流”方法——即在滚动数据集时读取一行，写入一行。您需要将查询结果作为游标获取并遍历它，而不是获取整个结果集。

在 JPA 中，使用如下代码：

ScrollableResults cursor = session.createQuery("from SomeEntity x").scroll();

while (cursor.next()) {
    writeToFile(cursor);
}

这意味着您一次只有一行内存，它完全可以扩展到任意数量的行并使用最少的内存（无论如何它更快）。

在结果集中一次获取所有行是一种方便的方法，适用于小型结果集（大多数情况下），但像往常一样，便利是有代价的，而且并非在所有情况下都有效。

java - 使用java将大量数据从数据库导出到.csv的问题

1 回答 1

Related

Reference