1

I have around 200 thousand bz2 files in which only one 1 valid. The size of each bz2 file is less than 200 bytes. I need to find the valid one. The command line bz2 utility is taking too much time.

Is there minimal check using file bytes by which I can find invalid bz2 and ignore further processing. I want to do in C/C++ as it would be way faster than shell scripts.

4

1 回答 1

1

得到了解决方案。根据 bz2 格式,前 3 个字符应为“BZh”。这过滤掉了除 19 个文件之外的所有文件。

于 2018-11-24T16:50:29.930 回答