这是我从读取 csv 文件中得到的输入文件:
Sample Info D3S1358 1 D3S1358 2 TH01 1 TH01 2 D21S11 1 D21S11 2 D21S11 3
TEST_646 17 17 9 9.3 28 28 nan
TEST_647 18 18 7 7 29 30 30.2
TEST_648 16 16 9 9 31.2 31.2 nan
我想把它转换成这样的形式:
Sample_name Marker mrk value
TEST_646 D3S1358 1 17
TEST_646 D3S1358 2 17
TEST_646 TH01 1 9
TEST_646 TH01 2 9.3
TEST_646 D21S11 1 28.0
TEST_646 D21S11 2 28.0
TEST_646 D21S11 3 nan
PS。为方便起见,以下是逗号分隔形式的值:
Sample Info, D3S1358 1, D3S1358 2, TH01 1, TH01 2, D21S11 1, D21S11 2, D21S11 3
TEST_646, 17, 17, 9, 9.3, 28, 28, nan
TEST_647, 18, 18, 7, 7, 29, 30, 30.2
TEST_648, 16, 16, 9, 9, 31.2, 31.2, nan
到目前为止,我的解决方案是:
samples = xls.parse(sheet).set_index('Sample Info')
cols = list(set(filter(None, [i[:-2] if i!="Sample Info" else None for i in samples.columns])))
sample_df_d= {'1' : pd.Series( len(cols)*[''], index=cols), '2' : pd.Series( len(cols)*[''], index=cols), '3' : pd.Series( len(cols)*[''], index=cols)}
sample_df_ = pd.DataFrame(sample_df_d)
sample_ser = sample_df_.stack()
sample_df = pd.DataFrame(sample_ser, columns=['value'])
#print sample_df
for i,j in samples.iterrows():
for i2,j2 in j.iteritems():
print j[0], i2[:-2], "\t", i2[-2:],"\t", j2
这会产生这样的东西:
17 D3S1358 1 17
17 D3S1358 2 17
17 TH01 1 9
17 TH01 2 9.3
17 D21S11 1 28.0