r - 如何删除 data.frame 中的冗余行（按列 [1, 2]，反之亦然）？

Question

我获得了一个 distance.class 表，其中样本相互比较以计算索引。结果，每个值都被复制，并且发生自我比较。请参见下面的示例表：

	样品1	样品2	样品3
样品1	0	0.5	1
样品2	0.5	0	0.8
样品3	1	0.8	0

我已经删除了自我比较（sample1 vs sample1 等）但我不知道如何删除冗余值（即表的上半部分）。所需的输出是如下表，然后我可以将其融合到 data.frame 中来构建绘图。这些样本也是我想用来构建图的特定类型。

	样品1	样品2
样品1
样品2	0.5
样品3	1	0.8

变量1	变量2	类型1	类型2	价值
样品1	样品2	一个	b	0.5
样品1	样品3	一个	一个	1
样品2	样品3	b	一个	0.8

score 0 · Accepted Answer

非常感谢，usedist::dist_make()我能够产生预期的解决方案。

在生成类“dist”矩阵调用phyloseq::distance()后，我从 phyloseq 对象中提取了分组变量：

group2samp <- list() 
    group_list <- get_variable(sample_data(physeq), group) 
    for (groups in levels(group_list)) { # loop over the no. of group levels
        target_group <- which(group_list == groups) 
        group2samp[[ groups ]] <- sample_names(physeq)[target_group] 
    }

然后我融化了生成的“group2samp”列表并重新排列了第一列的顺序以匹配我的距离矩阵：

library(reshape2)    
item_groups = melt(group2samp)

library(dplyr)
item_groups = arrange(item_groups, value)
# needed to reverse the column to match with my distance matrix
item_groups = item_groups[order(nrow(item_groups):1),]
item_groups = item_groups$L1 #extract only grouping variable

library(usedist)
distances = dist_groups(distance_matrix, item_groups)

distances
     Item1    Item2      Group1      Group2                          Label   Distance
1    sample9  sample8       Patch      Plaque       Between Patch and Plaque 0.94344640
2    sample9 sample70       Patch nonlesional  Between nonlesional and Patch 0.60253312
3    sample9 sample69       Patch       Patch                   Within Patch 0.62086228

r - 如何删除 data.frame 中的冗余行（按列 [1, 2]，反之亦然）？

1 回答 1

Related

Reference