我是 perl 的一个完全的业余爱好者,我想问一个问题来帮助我查找和替换我试图应用来更改 SAM 文件上的引用名称的功能,以便我可以通过 FindPeaks 运行它。这些文件非常大(从 5 到 17 个演出),我能够在文本编辑器中打开它们并在不使用编程语言的情况下运行比赛。
基本上,我希望 perl 匹配整个字符串,例如“gi|149288852|ref|NC_000067.5|NC_000067”,并且只用“chr1”替换整个字符串。
但是,到目前为止,我似乎只能将其替换为“chr1|chr1|chr1|chr1|chr1”或“gi|chr1|ref|NC000067.g|NC_000067”
谁能帮我吗?
编辑:
我尝试了一些不同的事情,但我想做的是修改我的主管从某人那里得到的程序以正确执行此操作,我将在下面发布:
#!/usr/bin/perl
use strict;
use warnings;
my %Chr = (
"gi|149288852|ref|NC_000067.5|NC_000067" => "chr1",
"gi|149288869|ref|NC_000076.5|NC_000076" => "chr10",
"gi|149288871|ref|NC_000077.5|NC_000077" => "chr11",
"gi|149292731|ref|NC_000078.5|NC_000078" => "chr12",
"gi|149292733|ref|NC_000079.5|NC_000079" => "chr13",
"gi|149292735|ref|NC_000080.5|NC_000080" => "chr14",
"gi|149301884|ref|NC_000081.5|NC_000081" => "chr15",
"gi|149304713|ref|NC_000082.5|NC_000082" => "chr16",
"gi|149313536|ref|NC_000083.5|NC_000083" => "chr17",
"gi|149321426|ref|NC_000084.5|NC_000084" => "chr18",
"gi|149323268|ref|NC_000085.5|NC_000085" => "chr19",
"gi|149338249|ref|NC_000068.6|NC_000068" => "chr2",
"gi|149352351|ref|NC_000069.5|NC_000069" => "chr3",
"gi|149354223|ref|NC_000070.5|NC_000070" => "chr4",
"gi|149354224|ref|NC_000071.5|NC_000071" => "chr5",
"gi|149361431|ref|NC_000072.5|NC_000072" => "chr6",
"gi|149361432|ref|NC_000073.5|NC_000073" => "chr7",
"gi|149361523|ref|NC_000074.5|NC_000074" => "chr8",
"gi|149361524|ref|NC_000075.5|NC_000075" => "chr9",
"gi|149361525|ref|NC_000086.6|NC_000086" => "chrX",
"gi|149361526|ref|NC_000087.6|NC_000087" => "chrY",
);
my $usage = "\n\n\tUsage: convert.pl <SAM file>\n\nThis script converts NCBI ref#s to chr #s\n\n";
die $usage unless ( @ARGV == 1);
my $file = $ARGV[0];
open (IN, "$file") or die "Can't open file: $file\n";
while (<IN>){
if (/\S+\s+\d+\s+(gi\S+)/){
my $tag = $1;
if (exists $Chr{$tag}){
my $line = $_;
$line =~ s/'$tag'/$Chr{$tag}/;
print $line;
}
else {
die "\n\n\nHash value doesn't exist for $tag $_\n\n";
}
}
else {
print $_;
}
}
结果是:“gi|chr1|ref|NC000067.g|NC_000067”
我也试过这个:
perl -pi -w -e 's/gi|149288852|ref|NC_000067.5|NC_000067/chr1/g;' *.sam
看看我是否可以一个一个地做,但结果是“chr1|ch1|chr1|chr1|chr1”