perl - 如何打印具有特定参考编号的范围值？

Question

我有一组数据文件如下所示。我想通过参考2组数字范围（scoreA和scoreB）来获得插值最终值（final，P）。假设“Eric”，他的 scoreA 是 35（值在 30.00 - 40.00 之间）， scoreB 是 48（值在 45.00 - 50.00 之间）。他将获得 2 组最终值范围，即 (22.88,40.90) 和 (26.99,38.99)。我想在数据文件中获得“Eric”和“George”的最终值。“乔治”的 scoreA = 38 和 scoreB = 26。

经过公式计算，我想在他的 scoreA=35 & scoreB=45 时得到准确的最终值。假设公式是 P=X+Y（P 是最终值），到目前为止，我一直在尝试如下所示的代码。但是它无法获得正确的行。

如何通过参考给出的数据得到准确的最终值范围？

数据文件

Student_name ("Eric")   
/* This is a junk line */   
scoreA ("10.00, 20.00, 30.00, 40.00")  
scoreB ("15.00, 30.00, 45.00, 50.00, 55.00")     
final (  
"12.23,19.00,37.88,45.98,60.00",\  
"07.00,20.11,24.56,45.66,57.88",\  
"05.00,15.78,22.88,40.90,57.99",\  
"10.00,16.87,26.99,38.99,40.66"\)  

Student_name ("Liy") 
/* This is a junk line */   
scoreA ("5.00, 10.00, 20.00, 60.00")  
scoreB ("25.00, 30.00, 40.00, 55.00, 60.00")     
final (  
"02.23,15.00,37.88,45.98,70.00",\  
"10.00,28.11,34.56,45.66,57.88",\  
"08.00,19.78,32.88,40.90,57.66",\  
"10.00,27.87,39.99,59.99,78.66"\)

Student_name ("Frank") 
/* This is a junk line */   
scoreA ("2.00, 15.00, 25.00, 40.00")  
scoreB ("15.00, 24.00, 38.00, 45.00, 80.00")     
final (  
"02.23,15.00,37.88,45.98,70.00",\  
"10.00,28.11,34.56,45.66,57.88",\  
"08.00,19.78,32.88,40.90,57.66",\  
"10.00,27.87,39.99,59.99,78.66"\)

Student_name ("George") 
/* This is a junk line */   
scoreA ("10.00, 15.00, 20.00, 40.00")  
scoreB ("25.00, 33.00, 46.00, 55.00, 60.00")     
final (  
"10.23,25.00,37.88,45.98,68.00",\  
"09.00,28.11,34.56,45.66,60.88",\  
"18.00,19.78,32.88,40.90,79.66",\  
"17.00,27.87,40.99,59.99,66.66"\)

编码

data();      
sub data() {   
    my $cnt = 0;
    while (my @array = <FILE>) {
        foreach $line(@array) {    
            if ($line =~ /Student_name/) {
                $a = $line;

                if ($a =~ /Eric/ or $cnt > 0 ) {
                    $cnt++;
                }
                if ( $cnt > 1 and $cnt <= 3 ) {
                    print $a;
                }
                if ( $cnt > 2 and $cnt <= 4 ) {
                    print $a;
                }
                if ( $cnt == 5 ) {
                    $cnt  =  0;  
                }
            }
        }
    }
}

结果

Eric    final=42.66  
George  final=24.30

score 1 · Accepted Answer

在我的评论中，我说解析相当容易。这是如何做到的。由于问题缺乏正确的文件格式规范，我将假设以下内容：

该文件由具有值的属性组成：

document ::= property*
property ::= word "(" value ("," value)* ")"

值是包含用逗号分隔的数字或单个单词的双引号字符串：

value ::= '"' ( word | number ("," number)* ) '"'

空格、反斜杠和注释无关紧要。

这是一个可能的实现；我不会详细解释如何编写一个简单的解析器。

package Parser;
use strict; use warnings;

sub parse {
  my ($data) = @_;

  # perform tokenization

  pos($data) = 0;
  my $length = length $data;
  my @tokens;
  while(pos($data) < $length) {
    next if $data =~ m{\G\s+}gc
         or $data =~ m{\G\\}gc
         or $data =~ m{\G/[*].*?[*]/}gc;
    if ($data =~ m/\G([",()])/gc) {
      push @tokens, [symbol => $1];
    } elsif ($data =~ m/\G([0-9]+[.][0-9]+)/gc) {
      push @tokens, [number => 0+$1];
    } elsif ($data =~ m/\G(\w+)/gc) {
      push @tokens, [word => $1];
    } else {
      die "unreckognized token at:\n", substr $data, pos($data), 10;
    }
  }

  return parse_document(\@tokens);
}

sub token_error {
  my ($token, $expected) = @_;
  return "Wrong token [@$token] when expecting [@$expected]";
}

sub parse_document {
  my ($tokens) = @_;
  my @properties;
  push @properties, parse_property($tokens) while @$tokens;
  return @properties;
}

sub parse_property {
  my ($tokens) = @_;
  $tokens->[0][0] eq "word"
    or die token_error $tokens->[0], ["word"];
  my $name = (shift @$tokens)->[1];
  $tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '('
    or die token_error $tokens->[0], [symbol => '('];
  shift @$tokens;
  my @vals;
  VAL: {
    push @vals, parse_value($tokens);
    if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
      shift @$tokens;
      redo VAL;
    }
  }
  $tokens->[0][0] eq "symbol" and $tokens->[0][1] eq ')'
    or die token_error $tokens->[0], [symbol => ')'];
  shift @$tokens;
  return [ $name => @vals ];
}

sub parse_value {
  my ($tokens) = @_;
  $tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
    or die token_error $tokens->[0], [symbol => '"'];
  shift @$tokens;

  my $value;

  if ($tokens->[0][0] eq "word") {
    $value = (shift @$tokens)->[1];
  } else {
    my @nums;
    NUM: {
      $tokens->[0][0] eq 'number'
        or die token_error $tokens->[0], ['number'];
      push @nums, (shift @$tokens)->[1];
      if ($tokens->[0][0] eq 'symbol' and $tokens->[0][1] eq ',') {
        shift @$tokens;
        redo NUM;
      }
    }
    $value = \@nums;
  }

  $tokens->[0][0] eq "symbol" and $tokens->[0][1] eq '"'
    or die token_error $tokens->[0], [symbol => '"'];
  shift @$tokens;

  return $value;
}

现在，我们得到以下数据结构作为输出Parser::parse：

(
  ["Student_name", "Eric"],
  ["scoreA", [10, 20, 30, 40]],
  ["scoreB", [15, 30, 45, 50, 55]],
  [
    "final",
    [12.23, 19, 37.88, 45.98, 60],
    [7, 20.11, 24.56, 45.66, 57.88],
    [5, 15.78, 22.88, 40.9, 57.99],
    [10, 16.87, 26.99, 38.99, 40.66],
  ],
  ["Student_name", "Liy"],
  ["scoreA", [5, 10, 20, 60]],
  ["scoreB", [25, 30, 40, 55, 60]],
  [
    "final",
    [2.23, 15, 37.88, 45.98, 70],
    [10, 28.11, 34.56, 45.66, 57.88],
    [8, 19.78, 32.88, 40.9, 57.66],
    [10, 27.87, 39.99, 59.99, 78.66],
  ],
  ...,
)

下一步，我们要将其转换为嵌套哈希，以便我们拥有结构

{
  Eric => {
    scoreA => [...],
    scoreB => [...],
    final  => [[...], ...],
  },
  Liy => {...},
  ...,
}

所以我们简单地通过这个小子运行它：

sub properties_to_hash {
  my %hash;
  while(my $name_prop = shift @_) {
    $name_prop->[0] eq 'Student_name' or die "Expected Student_name property";
    my $name = $name_prop->[1];
    while( @_ and $_[0][0] ne 'Student_name') {
      my ($prop, @vals) = @{ shift @_ };
      if (@vals > 1) {
        $hash{$name}{$prop} = \@vals;
      } else {
        $hash{$name}{$prop} = $vals[0];
      }
    }
  }
  return \%hash;
}

所以我们有主要代码

my $data = properties_to_hash(Parser::parse( $file_contents ));

现在我们可以进入问题的第 2 部分：计算你的分数。也就是说，一旦你明确了你需要做什么。

编辑：双线性插值

令f为返回某个坐标处的值的函数。如果我们在这些坐标上有一个值，我们可以返回它。否则，我们使用下一个已知值执行双线性插值。

^{双线性插值[}¹^]的公式为：

f(x, y) = 1/( (x_2 - x_1) · (y_2 - y_1) ) · (
              f(x_1, y_1) · (x_2 - x) · (y_2 - y)
            + f(x_2, y_1) · (x - x_1) · (y_2 - y)
            + f(x_1, y_2) · (x_2 - x) · (y - y_1)
            + f(x_2, y_2) · (x - x_1) · (y - y_1)
          )

现在，在第一个轴上表示表格中scoreA数据点的位置，在第二个轴上表示位置。我们必须做到以下几点：finalscoreA

断言请求的值x, y在边界内
获取下一个较小和下一个较大的位置
执行插值

.

sub f {
   my ($data, $x, $y) = @_;

   # do bounds check:
   my ($x_min, $x_max, $y_min, $y_max) = (@{$data->{scoreA}}[0, -1], @{$data->{scoreB}}[0, -1]);
   die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
      unless $x_min <= $x && $x <= $x_max
          && $y_min <= $y && $y <= $y_max;

要获取拳击指数x_1, x_2, y_1, y_2，我们需要遍历所有可能的分数。我们还将记住x_i1, x_i2, y_i1, y_i2底层数组的物理索引。

   my ($x_i1, $x_i2, $y_i1, $y_i2);
   for ([$data->{scoreA}, \$x_i1, \$x_i2], [$data->{scoreB}, \$y_i1, \$y_i2]) {
      my ($scores, $a_i1, $a_i2) = @$_;
      for my $i (0 .. $#$scores) {
         if ($scores->[$i] <= $x) {
            ($$a_i1, $$a_i2) = $i == $#$scores ? ($i, $i+1) : ($i-1, $i);
            last;
         }
      }
   }
   my ($x_1, $x_2) = @{$data->{scoreA}}[$x_i1, $x_i2];
   my ($y_1, $y_2) = @{$data->{scoreB}}[$y_i1, $y_i2];

现在可以按照上面的公式进行插值，但是在已知索引处的每次访问都可以变成通过物理索引的访问，所以f(x_1, y_2)会变成

$final->[$x_i1][$y_i2]

详细解释`sub f`

sub f { ... }用 name 声明一个 sub f，尽管这可能是一个坏名字。bilinear_interpolation可能是一个更好的名字。
my ($data, $x, $y) = @_声明我们的 sub 接受三个参数：
1. $data, 包含字段和的哈希引用scoreA，它们是数组引用。scoreBfinal
2. $x，沿 -scoreA轴需要插值的位置。
3. $y，沿 -scoreB轴需要插值的位置。
接下来，我们要断言和的位置$x是$y有效的，也就是在边界内。中的第一个值$data->{scoreA}是最小值；最大值在最后一个位置（索引-1）。为了同时获得两者，我们使用数组 slice。切片一次访问多个值并返回一个列表，例如@array[1, 2]. 因为我们使用使用引用的复杂数据结构，所以我们必须取消引用$data->{scoreA}. 这使得切片看起来像@{$data->{scoreA}}[0, 1].

现在我们有了$x_minand$x_max值，除非请求的值$x在最小/最大值定义的范围内，否则我们会抛出错误。这是真的，当
```
$x_min <= $x && $x <= $x_max
```
如果要么超出范围，$x要么$y超出范围，我们会抛出一个错误，显示实际范围。所以代码
```
die "indices ($x, $y) out of range ([$x_min, $x_max], [$y_min, $y_max])"
```
例如，可以抛出一个错误，例如
```
indices (10, 500) out of range ([20, 30], [25, 57]) at script.pl line 42
```
在这里我们可以看到 for 的值$x太小，而$y太大。
下一个问题是找到相邻值。假设scoreA成立[1, 2, 3, 4, 5]和，我们要选择和$x的值。但是因为我们可以稍后使用一些漂亮的技巧，所以我们宁愿记住相邻值的位置，而不是值本身。所以这会在上面的例子中给出和（记住箭头是从零开始的）。3.73423

我们可以通过遍历数组的所有索引来做到这一点。当我们找到一个 ≤ 的值时$x，我们会记住索引。Eg3是第一个 ≤ 的值$x，所以我们记住了索引2。对于下一个更高的值，我们必须有点谨慎：显然，我们可以只取下一个索引，所以2 + 1 = 3. 但现在假设$x是5。这通过了边界检查。≤ 的第一个值是$xvalue 5，所以我们可以记住 position 4。但是，在 position 没有条目5，所以我们可以使用当前索引本身。因为这会导致稍后除以零，所以我们最好记住位置3和4（值4和5）。

表示为代码，即
```
my ($x_i1, $x_i2);
my @scoreA = @{ $data->{scoreA} }; # shortcut to the scoreA entry
for my $i (0 .. $#scores) {        # iterate over all indices: `$#arr` is the last idx of @arr
   if ($scores[$i] <= $x) {        # do this if the current value is ≤ $x
      if ($i != $#scores) {        # if this isn't the last index
         ($x_i1, $x_i2) = ($i, $i+1);
      } else {                     # so this is the last index
         ($x_i1, $x_i2) = ($i-1, $i);
      }
      last;                        # break out of the loop
   }
}
```
在我的原始代码中，我选择了一个更复杂的解决方案，以避免复制粘贴相同的代码来查找$y.

因为我们还需要这些值，所以我们通过带有索引的切片来获取它们：
```
my ($x_1, $x_2) = @{$data->{scoreA}}[$x_i1, $x_i2];
```
现在我们有了所有周围的值$x1, $x_2, $y_1, $y_2，这些值定义了我们要在其中执行双线性插值的矩形。数学公式很容易翻译成 Perl：只需选择正确的运算符 ( *，而不是·乘法)，变量前面需要美元符号。

我使用的公式是递归的： f的定义是指它自己。这意味着一个无限循环，除非我们做一些思考并打破递归。f表示某个位置的值。在大多数情况下，这意味着插值。但是，如果$x和$y分别等于和中的值scoreA，scoreB我们不需要双线性插值，可以final直接返回条目。

这可以通过检查$x和$y是否都是它们的数组的成员，并提前返回来完成。$x_1, ..., $y_2或者我们可以使用所有都是数组成员的事实。我们不需要使用我们知道不需要插值的值进行递归，而是进行数组访问。这就是我们保存索引$x_i1, ..., $y_i2的目的。因此，无论原始公式所说f(x_1, y_1)或类似的地方，我们都写出等价的$data->{final}[$x_i1][$y_i2].

perl - 如何打印具有特定参考编号的范围值？

1 回答 1

编辑：双线性插值

详细解释sub f

Related

Reference

详细解释`sub f`