我有一个数据框如下:
data = {'CHROM':['chr1', 'chr2', 'chr1', 'chr3', 'chr1'],
'POS':[939570,3411794,1043223,22511093,24454031],
'REF':['T', 'T', 'CCT', 'CTT', 'CT'],
'ALT':['TCCCTGGAGGACC', 'C', 'C', 'CT', 'CTT'],
'Len_REF':[1,1,3,3,2], 'Len_ALT':[13,1,1,2,3]
}
df1 = pd.DataFrame(data)
它看起来如下:df1
CHROM POS REF ALT Len_REF Len_ALT
0 chr1 939570 T TCCCTGGAGGACC 1 13
1 chr2 3411794 T C 1 1
2 chr1 1043223 CCT C 3 1
3 chr3 22511093 CTT CT 3 2
4 chr1 24454031 CT CTT 2 3
我想根据列值向数据框添加新列,使其如下所示:
Positions Allele Combined
1:939570-939570 CCCTGGAGGACC 1:939570-939570:CCCTGGAGGACC
2:3411794-3411794 C 2:3411794-3411794:C
1:1043223-1043225 - 1:1043223-1043225:-
3:22511093-22511095 - 3:22511093-22511095:-
1:24454031-24454032 T 1:24454031-24454032:T
是df1['Positions']基于CHROM&中的值POS相对于 和 的变化REF而生成的ALT。
df1['Allele']使用REF&ALT