bash - 如何在最短的时间内读取 1TB 压缩文件

Question

我正在尝试读取压缩文件。我正在使用 command 执行此操作tar tf abc.tar.xz。因为文件的大小是1TB所以需要很多时间。我对 bash 脚本不太熟悉。我还有其他命令，例如zcat 3532642.tar.gz | moreand tar tf 3532642.tar.xz |grep --regex="folder1/folder2/folder3/folder4/"and

tar tvf 3532642.tar.xz --to-command \
'grep --label="$TAR_FILENAME" -H folder1/folder2/folder3/folder4/ ; true'

但是我发现它们在执行文件以读取其内容所需的时间方面没有太大区别。

有谁知道我怎样才能在最短的时间内为压缩文件处理如此大量的数据。任何帮助，将不胜感激！！！

score 1 · Accepted Answer

如前所述rrauenza，由于pigz可能不适用于该xz格式，因此有一个类似的工具pixz可用于并行索引 xz 压缩/解压缩。

从man 页面中可以明显看出，Pigz使用线程进行压缩/解压缩以使用多个处理器和内核。

与类似pigz，此命令还提供了一个选项来指定可以在多个内核中并行调用的线程数，以实现最大性能。

-p --processes n
Allow up to n processes (default is the number of online processors)

或者您可以从 bash 命令手动获取核心数getconf _NPROCESSORS_ONLN并将值设置为-p.

GitHub页面上的更多详细信息pixz还包含有关如何下载和安装的详细信息

（或者）

使用tar唯一的解决方案，只有在事先知道文件名的情况下才能完成

tar -zxOf <file-name_inside-tar> <file-containing-tar>

选项如下： -

   -f, --file=ARCHIVE
          use archive file or device ARCHIV

   -z, --gzip
          filter the archive through gzip

   -x, --extract, --get
          extract files from an archive

   -O, --to-stdout
          extract files to standard output

可能不如有效pigz，但仍然可以完成工作。

bash - 如何在最短的时间内读取 1TB 压缩文件

1 回答 1

Related

Reference