该perl
脚本将构建一个您应该能够使用的哈希。为方便起见,我用于List::MoreUtils
和uniq
转储Data::Printer
数据结构:
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use DDP;
my %paper ;
my @categories;
while (<DATA>){
chomp;
my @record = split /\t/ ;
$paper{$record[0]} = { map { $_ => 1 } @record[1..$#record] } ;
push @categories , @record[1..$#record] ;
}
@categories = uniq @categories;
foreach (keys %paper) {
foreach my $category(@categories) {
$paper{$_}{$category} //= 0 ;
}
};
p %paper ;
__DATA__
19801464 Animals Biodiversity Computational Biology/methods DNA
19696045 Environmental Microbiology Computational Biology/methods Software
输出
{
19696045 {
'Animals Biodiversity' 0,
'Computational Biology/methods' 1,
DNA 0,
'Environmental Microbiology' 1,
Software 1
},
19801464 {
'Animals Biodiversity' 1,
'Computational Biology/methods' 1,
DNA 1,
'Environmental Microbiology' 0,
Software 0
}
}
从那里到产生您想要的输出可能需要printf
正确格式化行。以下内容可能足以满足您的目的:
print "\t", (join " ", @categories);
for (keys %paper) {
print "\n", $_, "\t\t" ;
for my $category(@categories) {
print $paper{$_}{$category}," "x17 ;
}
}
编辑
格式化输出的一些替代方法......(我们使用x
将格式部分乘以@categories
数组中的长度或元素数量,以便它们匹配):
使用format
my $format_line = 'format STDOUT =' ."\n"
. '@# 'x ~~@categories . "\n"
. 'values %{ $paper{$num} }' . "\n"
. '.'."\n";
for $num (keys %paper) {
print $num ;
no warnings 'redefine';
eval $format_line;
write;
}
使用printf
:
print (" "x9, join " ", @categories, "\n");
for $num (keys %paper) {
print $num ;
map{ printf "%19d", $_ } values %{ $paper{$num} } ;
print "\n";
}
使用form
:
use Perl6::Form;
for $num (keys %paper) {
print form
"{<<<<<<<<}" . "{>}" x ~~@categories ,
$num , values %{ $paper{$num} }
}
根据您计划对数据执行的操作,您可能能够在 perl 中完成其余的分析,因此在您的工作流程的后期阶段,打印的精确格式化可能不是优先事项。有关想法,请参阅BioPerl。