我有一个数据框,它具有 4 组患者和细胞类型之间的共同特征。我有很多不同的功能,但共享的(存在于 1 个以上的组中)只是其中的一小部分。
我想制作一个 circos 图,以反映患者组和细胞类型之间的共享特征之间的少数联系,同时了解每组中有多少未共享的特征。
在我看来,它应该是一个有 4 个扇区的图(每组患者和细胞类型一个),它们之间有一些连接。每个扇区大小应反映组中要素的总数,并且该区域的大部分不应与其他组相连,而是空的。
这是我到目前为止所拥有的,但我不希望扇区专用于每个功能,只针对每组患者和细胞类型。
MWE:
library(circlize)
patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat
chordDiagram(as.data.frame(dat), transparency = 0.5)
编辑!!
@m-dz 在他的回答中显示的实际上是我正在寻找的格式,4 个不同的患者/细胞类型组合的 4 个扇区,仅显示连接,而非连接的功能,虽然没有显示,但应该解释部门的规模。
但是,我意识到我的情况比上面 MWE 中的情况更复杂。
一个特征被认为出现在 2 个患者/细胞类型组中,不仅当它在 2 个组中相同时,而且当它相似时......(高于阈值的序列同一性)。这样,我就有了裁员...
Patient1-cell1 中的特征 A 可以连接到 Patient2-cell1 中的特征 A,也可以连接到特征 B...对于 Patient1-cell1,特征 A 应该只计算一次(唯一计数),并扩展到患者 2-中的 2 个不同特征单元格1。
请参阅下面的示例,了解我的实际数据如何更精确,看看使用这个示例是否可以得到最终的 circos 图!谢谢!!
##MWE
#NON OVERLAPPING SETS!
#1: non-shared features
nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
nonshared
#2: features shared between cell types within same patient
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedcells
#3: features shared between patients within same cell types
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedpats
#4: features shared between patients and cell types
#4.1: shared across pat1-cell1, pat1-cell2, pat2-cell1, pat2-cell2
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
#4.2: shared across pat1-cell1, pat1-cell2, pat2-cell1
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
#4.3: shared across pat1-cell1, pat1-cell2, pat2-cell2
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
#4.4: shared across pat1-cell1, pat2-cell1, pat2-cell2
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
#4.5: shared across pat1-cell2, pat2-cell1, pat2-cell2
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
#4.6: shared across pat1-cell1, pat2-cell2
sharedall6 <- data.frame()
#4.7: shared across pat1-cell2, pat2-cell1
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))
sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)
sharedall
#you see there might be overlaps between the different subsets of sharedall, but not between sharedall, sharedparts, sharedcells, and nonshared
#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)
#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2 cell2 c3||c3
#4 pat1||pat2 cell2 c3||c4
#5 pat1||pat2 cell2 c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2