我有这个dataframe
(大约 446664 X 234 暗淡)称为mydf
(dput
提供)。这dataframe
有列REF
和ALT
.
REF
每行只有一个字母,但ALT
可以有一个,两个甚至三个用逗号(“,”)分隔的字母,其余列(示例列)是我需要完成所有工作的列。
考虑到任何字母REF
为 0,第一个字母ALT
为 1,第二个字母为 2,第三个字母为 3,我需要创建一个函数,以便:
我可以用字母替换所有样本列中的数字(即REF和ALT除外),如果有“./.”;
用 NA/NA 填充它们并折叠“/”以在每个单元格中获得成对的字母。
transpose
最后,我需要反转行 ( ) 中的所有示例列,如result
. 谢谢!mydf<- structure(list(REF = structure(c(1L, 4L, 3L, 2L, 3L), .Label = c("A", "C", "G", "T"), class = "factor"), ALT = structure(c(6L, 6L, 1L, 9L, 1L), .Label = c("A", "A,C", "A,G", "A,T", "C", "C,G", "C,T", "G", "G,T", "T"), class = "factor"), X860 = structure(c(1L, 3L, 2L, 1L, 1L), .Label = c("./.", "0/0", "0/1", "0/2", "1/1" ), class = "factor"), X861 = structure(c(1L, 6L, 2L, 1L, 1L), .Label = c("./.", "0/0", "0/1", "0/2", "1/1", "1/2"), class = "factor"), X862 = structure(c(6L, 3L, 1L, 2L, 1L), .Label = c("./.", "0/0", "0/1", "0/2", "1/1", "2/2"), class = "factor")), .Names = c("REF", "ALT", "X860", "X861", "X862"), row.names = c(NA, -5L), class = "data.frame")
预期输出:
X860 NANA TC GG NANA NANA
X861 NANA CG GG NANA NANA
X862 GG TC NANA CC NANA