0

Please give me some advice on removing newline characters before alphabets and ignoring the lines starting with >. eg:

>gi|16802049|ref|NP_463534.1| chromosomal replication initiation protein [Listeria monocytogenes EGD-e]
MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIISAPNNFVRDWLEKSYTQFIANILQEIT
GRLFDVRFIDGEQEENFEYTVIKPNPALDEDGIEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPA
KAYNPLFIYGGVGLGKTHLMHAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVL
LIDDIQFLAGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPDLETR
IAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITAGLAAEALKDIIPSSKS
QVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYLSRELTDASLPKIGDEFGGRDHTTVIHAH
EKISQLLKTDQVLKNDLAEIEKNLRKAQNMF

>gi|16802050|ref|NP_463535.1| DNA polymerase III subunit beta [Listeria monocytogenes EGD-e]
MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTLTGSDSDISIEAFIPLIENDEVIVEVE
SFGGIVLQSKYFGDIVRRLPEENVEIEVTSNYQTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPIN
VLKNIVRQTVFAVSAIEVRPVLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSE
LNKLLDDASESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAIDRASL
LARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKYMMDALRAFEGDDIQIS
FSGTMRPFVLRPKDAANPNEILQLITPVRTY

should come in a straight line and while the newline before lines starting with '>' should not be removed. I tried

\n^[a-z]

but it also removes the first alphabet of each line. Is it possible for it to do the same without removing the first alphabet of each line and ignore lines starting with '>'. thax in advance. Iam looking for a code for textpad.

4

2 回答 2

0

You can use this regex

 [\r\n]+(?=[a-zA-Z])

and replace it with empty string

OR

[\r\n]+([a-zA-Z])

and replace it with \1 or $1 whichever works

于 2013-06-27T04:42:51.507 回答
0

I have solved this by using regular expressions in perl. for anyone who needs something like this in the future

use warnings;

print "Please enter the name of the file\n";
my $n =<STDIN>;

print "Please enter the name of the output file\n";
my $n1=<STDIN>;

open(INFO,"$n") or die "cannot open";
@a = <INFO>;

#print @a;

foreach(@a)
    {
        $_ =~ s/\n//g;
        $_ =~ s/>/\n>/g;
    }
#print @a;
open (MYFILE, ">$n1");
print MYFILE @a;
close(MYFILE);
close(INFO);

It's extremely simple.

于 2013-06-28T08:09:52.853 回答