perl - 通过 Perl 从数据集中提取所需列

Question

我有 2 个文件，其中 file1（sample.txt）是样本 ID 列表（大约 1000）。这些样本 ID 是文件 2 (sampleValue.txt) 中的列名。file2是一个30000*1500的数据矩阵。我对 1500 列中的 1000 列中的所有行的值感兴趣，例如 1,2,5,6,70,71,75,100,112,114 等等。列上没有图案。所以，这就是我正在做的事情，并想知道如何改进它。这是我的代码：

## Opening first file
open my $IN, "sample.txt" or die $!;
my $header = <$IN>;

while(<$IN>){
chomp $_;
my @line = split('\t', $_);
$sampleID{$line[0]} = 1; ## Sample ID
}
close($IN);
print "Total number of sample ID: ", scalar(keys %sampleID),"\n"; ## 1000 columns

## Sample Value Data
open $IN, "sampleValue.txt" or die $!;

## Columns are sample names from file1
$header = <$IN>;
my @samples = split("\t", $header); ## 
print "Total samples: ",scalar(@samples),"\n"; ## 1500

## loop for all the samples ids or the columns I am interested in
for(my $i = 1; $i <= $#samples; $i++){ ## bcos the first instance is called header of the column 1
my $sample = $samples[$i];
$sampleValue{$sample} = $i if (exists $sampleID{$sample});
}

my $col = "";  
foreach my $key (keys %sampleValue){
$col = $sampleValue{$key}.",".$col;
}
chop($col);
print $col,"\n"; ## string of all the columns I am interested in

我执行上述循环的原因是我不想在逐行读取文件时通过哈希查找感兴趣的列。

## Reading the sample Value file row by row
while(<$IN>){
chomp $_;
print $_,"\n";
my @line = split("\t", $_);
@line = @line[$col]; ## error since it is string type
print @line,"\n";
}

我收到了该行的错误，@line = @line[$col];因为 $line 是一个字符串而不是数字。但是，如果你这样做，它会起作用@line[1,2,5,6,70,71,75,100,112,114]。所以，我的问题是是否有一种简单的方法可以将字符串转换为$col带有逗号的数字列，或者是否有更好的方法来获取所需的列？

perl - 通过 Perl 从数据集中提取所需列

0 回答 0

Related

Reference