0

我组合了 2 个序列文件,所以我有 1 个文件和 2 个序列。我已将这 2 个序列拆分为一个 @char 数组,因为我稍后必须逐个字符地比较它们。然而,序列中的 1 个在两行上。我想使用 join 功能来组合 2 行,但我不知道如何。

前任:

序列 1

ACGTATATATTATATCTGGCGCTATCGATGCTATCGAT
CGATGCGCG

序列 2

AGTGAGCGTAGCTAGCGGCGCGATCTAGCTA

到目前为止我的代码

#!usr/bin/perl
use strict;
use warnings;

# open file 1
open (my $seq1, "<", "file1.fa") or die $!;
# open file 2
open (my $seq2, "<", "file2.fa") or die $!;
# open combined file
open (my $combined, ">", "combined.txt") or die $!;

# read file 1, skip header line, write to combined file
while (my $line = <$seq1>) {
        if($line =~ />/) {
                next;
}

        else {
        print $combined "$line\n";
}
}
# read file 2, skip header line, write to combined file on new line
while (my $line2 = <$seq2>) {
        if ($line2 =~ />/) {
                next;
}
        else {
        print $combined "$line2\n";
}
}
# need to open combined file for reading
open (my $combined2, "<", "combined.txt") or die $!;
# read through combined file line by line
while (my $seqs = <$combined2>) {
        chomp($seqs);
# split sequences into characters
        my @chars = split(//, $seqs);
# the sequence from file1 is on 2 separate lines. Need to join these
# lines together
4

2 回答 2

4

考虑使用Bio::SeqIO来读取您的 fasta 文件,因为它可以处理多行的序列:

use strict;
use warnings;
use Bio::SeqIO;

my $in = Bio::SeqIO->new( -file => "file1.fa", '-format' => 'Fasta' );

while ( my $seq = $in->next_seq ) {
    my $sequence = $seq->seq;
    print $sequence, "\n";
}

内容file1.fa

>seq0
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>seq1
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>seq2
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>seq3
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK

输出:

FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
于 2013-02-19T07:01:25.287 回答
0

我假设您的序列由 ">" 符号分隔,这就是您使用 if($_ =~ />/) 作为船长的原因。如果不是,请回复评论,我将更改代码。在这里尝试以下操作:

open (fil1, "<", "file1.fa") or die $!;
# open file 2
open (fil2, "<", "file2.fa") or die $!;
# open combined file
open (combined, ">", "combined.txt") or die $!;

# read file 1, skip header line, write to combined file
while (<fil1>) {
        if($_ =~ />/) {
                print $combined "\n";
}

        else {
        print $combined "$line";
}
}
# read file 2, skip header line, write to combined file on new line
while (<fil2>) {
        if ($_ =~ />/) {
                print $combined "\n";
}
        else {
        print $combined "$line2";
}
}

只需检查combined.txt,如果在不同的行上有序列。

于 2013-09-18T11:41:58.917 回答