So far i have been able to merge two files and get the following dataframe (df1):
ID someLength someLongerSeq someSeq someMOD someValue
A 16 XCVBNMHGFDSTHJGF NMH T3(P) 7
A 16 XCVBNMHGFDSTHJGF NmH M3(O); S4(P); S6(P) 1
B 24 HDFGKJSDHFGKJSDFHGKLSJDF HFGKJSDFH S9(P) 5
C 22 QIOWEURQOIWERERQWEFFFF RQoIWERER Q16(D); S19(P) 7
D 19 HSEKDFGSFDKELJGFZZX KELJ S7(P); C9(C); S10(P) 1
i am looking for a way to do a regex match based on "someSeq" column to look for that substring in the "someLongersSeq" column and get the start location of the match and then add that to the whole numbers that are attached to the characters such as T3(P).
Example:
For the second row "ID:A","someSeq":"NmH"
matches starts at location 4 of the someLongerSeq (after to upper conversion of NmH). So i want to add that number 4 to someMOD fields M3(O);S4(P);S6(P) so that i get M7(O);S8(P);S10(P) and then overwrite the new value in the someMOD column.
And do that for each row. Regex is per row bases. Any help is really appreciated. Thanks.