perl - 用“，”分割一列并在计算中使用这些值

Question

我正在编写一个使用文本文件的脚本，其中一列中可以有两个字母（A、B、C 或 D），由“，”分隔。此列也可以只包含其中一个字母。我必须使用这两个字母在脚本的其余部分进行进一步计算。这是我的输入文件的简化示例（此处$variants）：

C1    C2    C3   C4   C5  C6 ... C9 
text   2    A    D    values and text in the other columns 
text   4    B    C    values and text in the other columns
text   5    A    B,D  values and text in the other columns

所以在 C4 的第 3 行有一个 B 和 D。在 C4 之后还有很多列，因为我在脚本的其他部分需要它们，所以无法更改。

我有第二个输入文件，根据 C3 和 C4 中存在的字母，从中提取了一些值。这就是第二个输入文件的样子（这里$frequency）

C1    C2    A  a   B   b   C   c   D   d
text   1    0  1   0   0   0   0   0   0
text   2    1  0   5   4   0   0   0   0
text   3    0  0   0   0   10  11  3   6
text   4    1  0   9   4   0   2   0   0
text   5    5  3   0   0   6   7   4   0

这就是我的输出应该是这样的：

C1    C2    C3    C4    C5   C6   C7   C8  C9  C10
text  2     A     D     1    0    0    0   empty  
text  4     B     C     9    4    0    2   empty
text  5     A     B,D   5    3    0    0    4   0

所以对于第 1 行，C3 中有 A，然后脚本从中提取 A 和 a 的值$frequency并将它们放入 C5 和 C6。然后将 C4 中的值放入输出文件中的 C7 和 C8 中。现在在第 3 行 C4 中有 B,D。所以脚本现在需要做的是将 B 和 b 的相应值放入 C7 和 C8 中，并将 D 和 d 的值放入 C9 和 C10 中。

我的脚本中唯一仍然存在问题的是在有“，”时拆分这个 C4。其余的工作。

这就是我的脚本有问题的部分的样子

while(<$variants>){
    next if /^\s*#/;
    next if /^\s*"/;
    chomp;
    my ($chr, $pos, $refall, @altall) = split /\t/; # How should I specify here the C4, as an array? So that I don't know
    my @ref_data = @{$frequency_data[$pos]}{$refall, lc($refall)};
    my @alt_data = @{$frequency_data[$pos]}{$altall, lc($altall)}; # this works for C3 ($refall), but not for C4 when there are two letters
    $pos = $#genes if $circular and $pos > $#genes; # adding annotation # this can be ignored here, since this line isn't part of my question
    print join("\t","$_ ", $genes[$pos] // q(), @ref_data, @alt_data), "\n"; # printing annotation
}

所以有人可以帮我用'，'拆分这个C4，并且仍然使用这些信息来提取值$variants

score 1 · Accepted Answer

我认为最简单的方法是从一开始就将第 3 列和第 4 列视为列表：

while(<$variants>){
    next if /^\s*#/;
    next if /^\s*"/;
    chomp;
    my ($chr, $pos, $refall_string, $altall_string, @other) = split /\t/;
    my @refall = split(",", $refall_string);
    my @altall = split(",", $altall_string);

    my @ref_data_all = (); # Treat C3 as array just in case... 
    foreach my $refall (@refall) {
        push @ref_data_all, @{$frequency_data[$pos]}{ $refall, lc($refall) };
    }
    my @alt_data_all = ();
    foreach my $altall (@altall) {
        push @alt_data_all, @{$frequency_data[$pos]}{ $altall, lc($altall) };
    }

    $pos = $#genes if $circular and $pos > $#genes; 
    print join("\t","$_ ", $genes[$pos] // q(),
               @ref_data_all, @alt_data_all), "\n";
}

我没有对此进行测试，但是即使存在一些小错误，该方法也应该很清楚。

score 0 · Accepted Answer

您只需要打几个map电话。

如果你写

map { $_, lc } split /,/, $refall

然后您已将字段拆分为任何逗号并将每个字母复制为大写和小写。

这是完整的循环（经过测试）。

while (<$variants>) {
    next if /^\s*#/;
    next if /^\s*"/;
    chomp;

    my ($chr, $pos, $refall, $altall) = split /\t/;
    my $entry = $frequency_data[$pos];
    my @ref_data = map { $entry->{$_} } map { $_, lc } split /,/, $refall;
    my @alt_data = map { $entry->{$_} } map { $_, lc } split /,/, $altall;
    $pos = $#genes if $circular and $pos > $#genes;

    print join("\t","$_ ", $genes[$pos] // q(), @ref_data, @alt_data), "\n";
}

perl - 用“，”分割一列并在计算中使用这些值

2 回答 2

Related

Reference