7

给定数百 GB 不同大小的资产,填充一组蓝光光盘的最佳算法是什么?

我正在尝试整合大量旧 CDROM、DVD 和小型硬盘驱动器,并将所有内容放入由 MD5 签名索引的数据库中。肯定是一项艰巨的任务。

我目前所做的是按降序对资产大小(通常是目录大小)进行排序,开始在填充列表中插入最大的资产,跳过任何不适合的资产,直到我用完资产。它几乎可以立即运行,但如果有必要,我不介意一夜之间运行。

它通常给我 95% 或更多的利用率,但我确信有一种方法可以使用其他组合来提供更高的效率。对于像磁盘映像这样的大型项目,我可以通过这种原始方法获得相当低的利用率。

我的想法是一次获取所有资产组合,1 然后 2,然后 3,... 项目,并保持最高字节数 < 25,025,314,816 字节的运行值指向总和为它的数组。当我一次使用了这么多资产以至于没有一个组合适合时,停止并使用运行的最高计数器指向的数组。

这是最好的算法吗?

有 2 个 Perl 模块似乎可以胜任这项任务,Algorithm-Combinatorics 和 Math-Combinatorics。有什么更快、更稳定、更酷的建议吗?

我的方案是编写一个脚本来计算大量目录的大小,并显示要刻录的几十个磁盘的最佳内容。

而且,我不想只是逐个文件地填写文件,因为我希望整个目录都在同一张光盘上。

4

4 回答 4

5

这是一个被称为装箱的 NP 完全问题。没有已知的多项式时间算法可以最佳地解决它。换句话说,如果不尝试所有的解决方案,就无法找到最佳解决方案。

从好的方面来说,一个非常简单的启发式方法,比如“将最大的剩余文件夹放在第一个有空间的磁盘上”,将保证您使用的磁盘数量少于最佳情况的两倍。(您可以阅读有关该问题的 Wikipedia 文章的更多详细信息)。

于 2012-07-27T01:21:00.240 回答
2

该算法称为一维装箱。该算法非常快但不是最优的。您也可以使用蛮力算法,但搜索空间非常大。这是一个贪心算法的程序:http ://www.phpclasses.org/package/2027-PHP-Pack-files-without-exceeding-a-given-size-limit.html

于 2012-07-27T01:16:01.803 回答
0

The most practical method I have yet found to efficiently fill my Blu-Ray disks.

I make a list of fully qualified paths to all of the available files to burn.

Then (arbitrarily) decide on how many directory levels to consider a bunch or accept a command line option for it. This is to keep directories full of like items all together on a single blu-ray. There is also a STUFF option to insert the largest files first and when a file would cause an overflow, look to the next smaller one until you run out of files or space.

Make a hash with each directory as key and total size of the files it contains as the data. Also keep a parallel hash with the count of files per directory as slack space and directory overhead apparently add up and have to be accounted for.

Pick 22 as the magic number. If you have <= 22 directories, try all combinations to find the one closest to but not over 25.025 GB. If you have more than 22, just use the 22 largest. I use the Perl module Algorithm::Combinatorics to find all of the combinations. Through trial and mostly error, I determined that combinations of 21 items takes just a few seconds. 23 items takes many minutes which is longer than my attention span. 22 takes about 35 seconds.

An output directory is also accepted and checked for existing data. There is an option to move the files (copy, check size and unlink).

Every time I bought a new hard drive, it was usually twice as large as the previous one so I would just copy everything over. With a Nikon D800E (Extreme!), HDR and Panoramas, I finally ran out of space.

My project was to unique, weed and consolidate 15 years worth of [mostly junk] photos, videos, movies, music, etc. I inventoried roughly a dozen storage devices, calculated MD5 signatures and put them all in a database. I picked one drive as master for pics and one for video and nuked everything else. I found 8 copies of some stuff!

I now have about 10 TB of free disk space!!!

Below the function which does all of the real work in case anybody is interested.

=============================================== Oops! Your answer couldn't be submitted because:

Your post appears to contain code that is not properly formatted as code

The stupid web page mangled my pristine code. Sorry :(..

于 2012-10-18T22:40:53.410 回答
-2

使用“背包”优化问题中的算法。

http://en.wikipedia.org/wiki/Knapsack_problem

  1. 将权重设置为等于文件大小
  2. 设置值等于“重量”
  3. 为要打包的每个后续磁盘运行算法

它可能不是最佳选择(它将最大化下一个磁盘的填充因子,而不是最小化所需的总磁盘数量),但它有据可查,并且很容易找到适用于您选择的编程语言的示例和工作代码(甚至电子表格)在网络上。

于 2012-07-27T01:15:24.310 回答