java - 如何使文件稀疏？

Question

如果我有一个包含许多零的大文件，我怎样才能有效地使它成为一个稀疏文件？

读取整个文件（包括全零，可能稀疏存储）并使用寻求将其重写为新文件以跳过零区域的唯一可能性是什么？

或者是否有可能在现有文件中进行此操作（例如 File.setSparse(long start, long end)）？

我正在寻找 Java 或一些 Linux 命令的解决方案，文件系统将是 ext3 或类似的。

score 13 · Accepted Answer

8年发生了很多变化。

分配

fallocate -d filename可用于在现有文件中打孔。从fallocate(1)手册页：

       -d, --dig-holes
              Detect and dig holes.  This makes the file sparse in-place,
              without using extra disk space.  The minimum size of the hole
              depends on filesystem I/O block size (usually 4096 bytes).
              Also, when using this option, --keep-size is implied.  If no
              range is specified by --offset and --length, then the entire
              file is analyzed for holes.

              You can think of this option as doing a "cp --sparse" and then
              renaming the destination file to the original, without the
              need for extra disk space.

              See --punch-hole for a list of supported filesystems.

（那个清单：）

              Supported for XFS (since Linux 2.6.38), ext4 (since Linux
              3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

tmpfs 在该列表中是我觉得最有趣的一个。文件系统本身的效率足以仅消耗存储其内容所需的 RAM，但使内容稀疏可能会进一步提高效率。

GNU`cp`

此外，在某个过程中 GNUcp获得了对稀疏文件的理解。引用有关其默认模式的cp(1)手册页--sparse=auto：

稀疏的 SOURCE 文件由粗略的启发式检测，相应的 DEST 文件也变得稀疏。

但也有--sparse=always，它激活文件复制等价fallocate -d于就地执行的操作：

指定--sparse=always在 SOURCE 文件包含足够长的零字节序列时创建稀疏 DEST 文件。

我终于可以退役我tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -)的单行机了，这 20 年来一直是我复制稀疏文件并保留其稀疏性的灰胡子方式。

score 4 · Accepted Answer

Linux / UNIX 上的某些文件系统能够在现有文件中“打孔”。看：

LKML 发布有关该功能的信息
UNIX 文件截断常见问题解答（搜索 F_FREESP）

它不是很便携，也不是完全一样；截至目前，我相信 Java 的 IO 库没有为此提供接口。

如果可以通过或通过任何其他机制进行打孔fcntl(F_FREESP)，则它应该比复制/查找循环快得多。

score 2 · Accepted Answer

我认为你最好预先分配整个文件并维护一个被占用的页面/部分的表/位集。

使文件稀疏会导致这些部分在被重复使用时被碎片化。也许节省几 TB 的磁盘空间不值得高度碎片化文件的性能损失。

score 0 · Accepted Answer

根据这篇文章，目前似乎没有简单的解决方案，除了使用 FIEMAP ioctl。但是，我不知道如何将“非稀疏”零块变成“稀疏”块。

score 0 · Accepted Answer

您可以$ truncate -s filename filesize 在 linux 终端上使用来创建具有

只有元数据。

注意——文件大小以字节为单位。

java - 如何使文件稀疏？

5 回答 5

分配

GNUcp

Related

Reference

GNU`cp`