0

所以我让这种方法在数据集很小的情况下也能正常工作。然而,当它变得更大一点时......

此脚本的目的是找到每个可能的集合组合,而不会重复。这样我就可以将它们存储在数据库表中。

set 1: [701,744,410,646,723,434]
set 2: [701,744,410,646,723,435]
set 3: etc..

我还应该注意,我需要保持与原始键的关系。因此 type1 中的项目不能移动到任何其他类型。希望这是有道理的。

Collecting pieces...
  pieces[type1] = [701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722]
  pieces[type2] = [744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765]
  pieces[type3] = [410, 412, 413, 414, 415, 419, 422, 424, 426, 427, 429, 372, 374, 376, 378, 380, 382, 385, 395, 397, 399, 401]
  pieces[type4] = [646, 647, 649, 651, 653, 655, 657, 671, 672, 673, 674, 679, 681, 684, 686, 688, 691, 695, 697, 698, 699, 700]
  pieces[type5] = [723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743]
  pieces[type6] = [434, 435, 438, 440, 443, 446, 447, 462, 464, 467, 469, 484, 485, 486, 487, 488, 489, 490, 491, 492, 494, 496]
Took 0.4265 seconds to collect.

Generating possibilities...
/Projects/my_project/lib/tasks/possibilities.rake:109: [BUG] Segmentation fault
ruby 1.9.3p286 (2012-10-12 revision 37165) [x86_64-darwin12.2.0]

是的,段错误。

这是我用来实现它的代码。

def permutations!(input)
  permutations_start = Time.now
  puts "Generating possibilities..."
  input.each do |key, possibilities|
    possibilities.map!{|p| {key => p} }
  end

  digits = input.keys.map!{|key| input[key] }

  # This is the line that seems to want to cry.
  result = digits.shift.product(*digits)

  puts "# of generated possibilities: #{result.length}"
  puts "Took #{(Time.now - permutations_start).round(4)} seconds to generate.\n\n"

  return result
end

pieces = {}
pieces['type1'] = [701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722]
pieces['type2'] = [744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765]
pieces['type3'] = [410, 412, 413, 414, 415, 419, 422, 424, 426, 427, 429, 372, 374, 376, 378, 380, 382, 385, 395, 397, 399, 401]
pieces['type4'] = [646, 647, 649, 651, 653, 655, 657, 671, 672, 673, 674, 679, 681, 684, 686, 688, 691, 695, 697, 698, 699, 700]
pieces['type5'] = [723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743]
pieces['type6'] = [434, 435, 438, 440, 443, 446, 447, 462, 464, 467, 469, 484, 485, 486, 487, 488, 489, 490, 491, 492, 494, 496]
possibilities = permutations!(pieces)
4

1 回答 1

0

就内存而言,它看起来还不错。尽管我预料到了,但 CPU 就像以前一样固定。

现在大部分时间都在将记录存储在数据库中。我希望我可以使用 activerecord-import 或批量插入来更快地完成它,但我必须在保存之前对组进行计算。所以我将它设置为 before_save 钩子,以便在模型中处理它。

按照目前的速度,获取数据库中的所有数据大约需要几个月的时间。

def generate(input)
  input.each do |key, possibilities|
    possibilities.map!{|p| {key => p} }
  end

  digits = input.keys.map!{ |key| input[key] }

  i = 1
  shifted = digits.shift
  shifted.each do |item|
    puts "Generating groups #{i} of #{shifted.length}..."
    permutations_start = Time.now
    results = [item].product(*digits)
    puts "# of generated groups in the set number - #{i}: #{results.length}"
    puts "Took #{(Time.now - permutations_start).round(4)} seconds to generate.\n\n"

    # Storing the groups
    puts "Storing groups..."
    storing_start = Time.now
    results.each { |item| Group.create!(item.reduce({}, :update)) }
    puts "Took #{(Time.now - storing_start).round(4)} seconds to store.\n\n"

    i = i + 1
  end
end

示例输出:

Collecting pieces...
  possibilities['type1'] = [701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722]
  possibilities['type2'] = [744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765]
  possibilities['type3'] = [410, 412, 413, 414, 415, 419, 422, 424, 426, 427, 429, 372, 374, 376, 378, 380, 382, 385, 395, 397, 399, 401]
  possibilities['type4'] = [646, 647, 649, 651, 653, 655, 657, 671, 672, 673, 674, 679, 681, 684, 686, 688, 691, 695, 697, 698, 699, 700]
  possibilities['type5'] = [723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743]
  possibilities['type6'] = [434, 435, 438, 440, 443, 446, 447, 462, 464, 467, 469, 484, 485, 486, 487, 488, 489, 490, 491, 492, 494, 496]
Took 0.4248 seconds to collect.

Generating groups 1 of 22...
There were 4,919,376 groups in the set number 1.
Took 1.819 seconds to generate.

Storing Groups...
250 items took 11.7158 seconds
250 items took 11.5094 seconds
250 items took 11.6994 seconds
250 items took 11.5678 seconds
250 items took 11.5529 seconds
于 2012-11-13T01:30:57.867 回答