Edit : I know feature.type
will give gene/CDS and feature.qualifiers
will then give "db_xref"/"locus_tag"/"inference" etc. Is there a feature.
object which will allow me to access the location (eg: [5240:7267](+)
) directly?
This URL give a bit more info, though I can't figure out how to use it for my purpose... http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html#location_operator
Original Post:
I am trying to modify the location of features within a GenBank file. Essentially, I want to modify the following bit of a GenBank file:
gene 5240..7267
/db_xref="GeneID:887081"
/locus_tag="Rv0005"
/gene="gyrB"
CDS 5240..7267
/locus_tag="Rv0005"
/inference="protein motif:PROSITE:PS00177"
...........................
to
gene 5357..7267
/db_xref="GeneID:887081"
/locus_tag="Rv0005"
/gene="gyrB"
CDS 5357..7267
/locus_tag="Rv0005"
/inference="protein motif:PROSITE:PS00177"
.............................
Note the changes from 5240 to 5357
So far, from scouring the internet and Stackoverflow, I have:
from Bio import SeqIO
gb_file = "mtbtomod.gb"
gb_record = SeqIO.parse(open(gb_file, "r+"), "genbank")
rvnumber = 'Rv0005'
newstart = 5357
final_features = []
for record in gb_record:
for feature in record.features:
if feature.type == "gene":
if feature.qualifiers["locus_tag"][0] == rvnumber:
if feature.location.strand == 1:
feature.qualifiers["amend_position"] = "%s:%s" % (newstart, feature.location.end+1)
else:
# do the reverse for the complementary strand
final_features.append(feature)
record.features = final_features
with open("testest.gb","w") as testest:
SeqIO.write(record, testest, "genbank")
This basically creates a new qualifier called "amend_position".. however, what I would like to do is modify the location directly (with or without creating a new file...)
Rv0005 is just an example of a locus_tag I need to update. I have about 600 more locations to update, which explains the need for a script.. Help!