I am looking to extract all Methionine residues to the end from a sequence.
In the below sequence:
MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
Original Amino Acid sequence:
atgtttgaaatcgaagaacatatgaaggattcacaggtggaatacataattggccttcataatatcccattattgaatgcaactatttcagtgaagtgcacaggatttcaaagaactatgaatatgcaaggttgtgctaataaatttatgcaaagacattatgagaatcccctgacgggg
I want to extract from the sequence any M residue to the end, and obtain the following:
- MFEIEEHMKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
- MKDSQVEYIIGLHNIPLLNATISVKCTGFQRTMNMQGCANKFMQRHYENPLTG
- MNMQGCANKFMQRHYENPLTG
- MQGCANKFMQRHYENPLTG
- MQRHYENPLTG
With the data I am working with there are cases where there are a lot more "M" residues in the sequence.
The script I currently have is below. This script translates the genomic data first and then works with the amino acid sequences. This does the first two extractions but nothing further.
I have tried to repeat the same scan method after the second scan (See the commented part in the script below) but this just gives me an error:
private method scan called for #<Array:0x7f80884c84b0> No Method Error
I understand I need to make a loop of some kind and have tried, but all in vain. I have also tried matching but I haven't been able to do so - I think that you cannot match overlapping characters a single match method but then again I'm only a beginner...
So here is the script I'm using:
#!/usr/bin/env ruby
require "bio"
def extract_open_reading_frames(input)
file_output = File.new("./output.aa", "w")
input.each_entry do |entry|
i = 1
entry.naseq.translate(1).scan(/M\w*/i) do |orf1|
file_output.puts ">#{entry.definition.to_s} 5\'3\' frame 1:#{i}\n#{orf1}"
i = i + 1
orf1.scan(/.(M\w*)/i) do |orf2|
file_output.puts ">#{entry.definition.to_s} 5\'3\' frame 1:#{i}\n#{orf2}"
i = i + 1
# orf2.scan(/.(M\w*)/i) do |orf3|
# file_output.puts ">#{entry.definition.to_s} 5\'3\' frame 1:#{i}\n#{orf3}"
# i = i + 1
# end
end
end
end
file_output.close
end
biofastafile = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
extract_open_reading_frames(biofastafile)
The script has to be in Ruby since this is part of a much longer script that is in Ruby.