perl - 迭代一个程序擦除原始集合中的数据

Question

我正在尝试研究一种算法，在该算法中，给定一个数字列表，我必须计算一个系数，该系数由数据列表中建立的三角形数量与数字具有的最小邻居数之间的比率给出；例如，给定文件的前两行：

1 2 3 4 5 6 9
2 1 3
...

如果一行的第一个元素出现在其他行中，并且如果后续行的第一个元素出现在考试中的行中，那么我找到了一个链接；
如果“链接”存在，那么我想计算考试行中的其他元素出现在链接所在行中的次数并打印“我找到了 z 个三角形”。

比如本例中，当程序比较第一行和第二行，发现“链接1 2 ”存在，发现有1个由顶点1,2,3组成的三角形。在算法中，我必须将三角形的数量 + 1 除以每行中元素的最小数量 - 2（在这种情况下，最小数量来自第二行，值为 3-2=1）；我正在寻找的系数是 (1+1)/1 = 2;

输出文件将被写为：

1 2 1

在前两列中，我找到了建立链接的元素，在第三列中找到了系数的值；

这是我到目前为止编写的代码：

use strict;
use warnings;
use 5.010;
use List::Util;

my $filename = "data";
open my $fh, '<', $filename or die "Cannot open $filename: $!";

my $output_file = "output_example";
open my $fi, ">", $output_file or die "Error during $output_file opening: $!";

my %vector;
while (<$fh>) {
    my @fields = split;
    my $root = shift @fields;
    $vector{$root} = { map { $_ => 1} @fields };
}

my @roots = sort { $a <=> $b } keys %vector;
for my $i (0 .. $#roots) {
    my $aa = $roots[$i];
    my $n_element_a = scalar (keys %{$vector{$aa}})-1;

    for my $j ($i+1 .. $#roots) {
        my $minimum;
        my $bb = $roots[$j];
        my $n_element_b = scalar (keys %{$vector{$bb}})-1;
        next unless $vector{$aa}{$bb} and $vector{$bb}{$aa};
        if ($n_element_a < $n_element_b){
            $minimum = $n_element_a;
        }else {
            $minimum = $n_element_b;
        }

        my $triangles = 0;
        for my $cc ( keys %{$vector{$aa}} ) {
            next if $cc == $aa or $cc == $bb;
            if ($vector{$bb}{$cc}) {
                $triangles++;
            }
        }

        my $coeff;
        my @minimum_list;           
        if ($minimum == 0){
            $coeff = ($triangles +1)/($minimum+0.00000000001);
        } else {
            $coeff = ($triangles +1)/($minimum);
        }
        say $fi "$aa $bb $coeff";
    }
}
__DATA__
1 2 3 4 5 6 9
2 1 3
3 1 2
4 1 5
5 1 4
6 1 7 8
8 6 7
9 1 10 11
10 9 11 12 14
11 9 10 12 13
12 10 13 14
13 11 12
14 10 12 15
15 14

我把整个数据集放在最后。输出文件给出：

__OUTPUT__
1 2 2
1 3 2
1 4 2
1 5 2
1 6 0.5
1 9 0.5
2 3 2
4 5 2
6 8 2
9 10 1
9 11 1
10 11 1
10 12 1
10 14 1
11 13 2
12 13 1
12 14 1
14 15 100000000000

现在我想找到系数的最小值，识别呈现这个较低值的链接，擦除原始数据集中的这些元素并在“新”数据集上重复相同的程序。

例如，在这种情况下，呈现最小值的链接是1 6和1 9，系数为0.5。所以现在程序应该在文件数据中删除以“1”开头的行中的元素“6”，反之亦然，与 9 相同。所以现在“新”数据集将是：

1 2 3 4 5
2 1 3
3 1 2
4 1 5
5 1 4
6 7 8
8 6 7
9 10 11
10 9 11 12 14
11 9 10 12 13
12 10 13 14
13 11 12
14 10 12 15
15 14

我正在寻找的是：

如何从data文件中包含的数据集中删除呈现最小系数值的元素？
如何迭代过程 N 次？

为了从输出文件中找到最小值，我想在程序末尾添加以下几行：

my $file1 = "output_example";
open my $fg,  "<", $file1 or die "Error during $file1 opening: $!";

my @minimum_vector;
while (<$fg>) {
    push @minimum_vector, [ split ];
}

my $minima=$minimum_vector[0][2];
for my $i (0 .. $#minimum_vector){
    if($minima >= $minimum_vector[$i][2] ){
        $minima=$minimum_vector[$i][2];
    }
}
say $minima;
close $file1;

但它给了我一个错误，$minima因为我认为我无法从我刚刚创建的同一个文件中读取（在本例中为 output_example 文件）。如果我在不同的程序中编译，它就会运行。

score 0 · Accepted Answer

迭代的最佳方法可能是将代码分解为子例程。这也将有助于澄清代码并准确跟踪可能出现问题的位置。

use strict;
use warnings;
use 5.010;
use List::Utils qw/min/;

sub load_initial_data {
    # open and read file, load it into an arrayref and return it.
}

sub find_coefficients {
    my $data = shift;
    my @results;
    foreach my $row (@$data) {
        # do stuff to calculate $aa, $bb, $coeff
        push @results, [$aa, $bb, $coeff];
    }
    return \@results;
}

sub filter_data {
    my $data = shift;
    my $coefficients = shift;
    my $min = min map { $_->[2] } @$coefficients;
    my @deletions = grep { $min == $_->[2] } @$coefficients;
    foreach my $del (@deletions) {
        delete( $data->{$del->[0]}{$del->[1]} );
    }
}

# doing the actual work:
my $data = load_initial_data(); 
my $coeffs;
foreach my $pass (0 .. $N) {
    $coeffs = find_coefficients( $data );
    $data = filter_data( $data, $coeffs );
    # You could write $data and/or $coeffs out to a file here
    # if you need to keep the intermediate stages
}

perl - 迭代一个程序擦除原始集合中的数据

1 回答 1

Related

Reference