r - 使用gather来整理R中的数据集-属性不相同

Question

我的最终目标是用我的数据集计算两个值 (T/D) 之间的比率，但似乎最好的方法是使用 tidyr 之类的东西来整理我的数据集。我一直在尝试使用收集和分离，但遇到了一些问题。数据如下所示：

head(df9)
>  GeneID     D1     T1      D2     T2     D3     T3     D4     T4      D5      T5     D6     T6     D7     T7     D8     T8
>1    A2M 8876.5 8857.9 10246.8 9453.9 6279.6 3846.5 8735.3 6609.9 7732.95  2452.4 8705.2   6679 7510.5 4318.3 8957.7 4092.4
>2   ABL1 2120.8 1664.9    2525 1546.4   1993 1713.7 1849.7 1761.9  2297.7  2462.5 2698.2 1975.8 2480.3 1694.6   2471 1784.1
>3   ACP1 1266.6 1347.1  910.95  725.1 1327.6 1589.5   1175 1086.9  1187.3 1065.15   1080 1048.2 1213.8 1337.9  831.5  814.1

但我希望它看起来像这样：

> GeneID  pt.num  type value 
>ASM      1        D    8876 
>ASM      1       T    8857

我尝试了以下。但我不断收到错误。警告消息：变量之间的属性不相同；他们将被丢弃。

gather(df9, pt.num.type, value, 2:17, -GeneID)
separate(pt.num.type, c("pt.num","type", 1))

当我清理数据时，我想使用以下内容来获取 T/D 比率。

df10 <- ddply(df9, .(type), transform, Ratio=T/D)

任何有关清理我的数据和运行该功能的建议将不胜感激。谢谢！

score 2 · Accepted Answer

我认为你很接近，你只是放错了sep论点：

gather(df9, pt.num.type, value, 2:17)
separate(pt.num.type, c("type", "pt.num"), sep=1)

使用dplyr您可以执行以下操作：

df9 %>% 
  gather(pt.num.type, value, 2:5) %>%
  separate(pt.num.type, c("type", "pt.num"), sep=1) %>%
  group_by(GeneID, type) %>%
  summarise(sum = sum(value))

#   GeneID type  sum
# 1    A2M    D  989
# 2    A2M    T 1033
# 3   ABL1    D  464
# 4   ABL1    T  170
# 5   ACP1    D 1036
# 6   ACP1    T  738

然后，如果您想获得比率（取决于您的分离方式），您可以执行以下操作：

df9 %>% 
  gather(pt.num.type, value, 2:5) %>%
  separate(pt.num.type, c("type", "pt.num"), sep=1) %>%
  spread(type, value) %>%
  mutate(Ratio = D/T)

#   GeneID pt.num   D   T      Ratio
# 1    A2M      1 887  88 10.0795455
# 2    A2M      2 102 945  0.1079365
# 3   ABL1      1 212  16 13.2500000
# 4   ABL1      2 252 154  1.6363636
# 5   ACP1      1 126  13  9.6923077
# 6   ACP1      2 910 725  1.2551724

r - 使用gather来整理R中的数据集-属性不相同

1 回答 1

Related

Reference