0

大家好,

我想可视化我的数据集,但我什至很难命名我需要的可视化类型!

我想看看参考标准和三个新测试之间的重叠集。

参考标准具有二元结果(R 和 S)。

三个新测试中的每一个都可以有两个以上的结果(R、S、失败、不确定)

所以我的一部分数据看起来像这样(作为 R 数据框):

Subject <- c("11-0001","11-0002","11-0003","11-0004","11-0005","11-0007","11-0008","11-0010","11-0011","11-0012","11-0013","11-0014","11-0015","11-0016","11-0017","11-0018","11-0019","11-0020","11-0021","11-0022","11-0023","11-0025","11-0027","11-0029","11-0030","11-0035","11-0036","11-0037","11-0038","11-0039","11-0040","11-0041","11-0043","11-0044","11-0045","11-0046","11-0047","11-0048","11-0050","11-0052","11-0053","11-0054","11-0055","11-0056","11-0058","11-0059","11-0061","11-0062","11-0063","11-0064","11-0065","11-0066","11-0068","11-0069","11-0070","11-0071","11-0072","11-0074","11-0075")
ReferenceStandard <- c("R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","S","R","R","R","R","S","R","S","R","S")
TestA<- c("R","R","R","R","R","R","S","I","R","R","R","I","R","R","R","R","I","R","R","R","R","R","R","R","R","R","S","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","I","R","I","R","R","I","R","S","R","R","R","R","S","I","S","R","S")
TestB <- c("R","R","R","R","R","R","S","I","R","R","R","I","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","I","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","I","R","R","R","R","S","I","S","R","S")
TestC <-c("R","R","R","R","R","R","R","R","R","R","R","ND","R","R","R","R","R","R","R","R","R","R","R","R","R","R","S","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","S","R","R","R","R","S","ND","S","R","S")

mydata <- data.frame(subject=subject, ReferenceStandard=ReferenceStandard, TestA=TestA, TestB=TestB, TestC=TestC)

等等(我有 1000 个科目)...

因此,虽然针对参考标准的所有单独测试的灵敏度/特异性非常相似,但使用 Cochran 和 McNemar 的测试存在显着差异。

现在,我的假设是每个测试都以不同的方式失败。因此,TestA 可能在这组主题上失败,而 TestB 在另一组主题上失败。总的来说,这些数字足够相似,因此敏感性/特异性非常相似,但配对样本统计测试强调情况并非如此。所以我想目视检查。

然而,我真的被困在什么叫这个上(因为新的测试有四个类别)。

我研究了欧拉图,但我不相信它可以支持我需要的东西。

我以为我能做的就是制作两组欧拉图。

  1. 从Reference=R的角度来看。所以 Ref 和 TestA 的重叠只是 Rs,而 Ref 和 TestA 之间的不重叠是 Reference=R 和 TestA != R。
  2. 从Reference=S的角度重复以上内容。

我还考虑过一个奇怪的热图,其中 Y 轴是所有 1000 个对象,X 轴的顺序与我上面的数据一样,但四列各有颜色编码。根据我对 Y 轴的排序方式,我可以展示数据的不同方面。问题是用那种图形很难挑选出图案。

还有其他想法吗?非常感谢其他可视化的链接!

4

1 回答 1

2

这是对您的数据集进行可视化的尝试。如果没有实际数据,很难知道要强调什么,但这里有一个样本可供其他海报使用。根据您的帖子,我试图强调测试结果分布的差异Ref

library(reshape2)
library(ggplot2)

# make a data set

df <- data.frame(Subject=1:100, Ref = sample(c('R','S'),100,T), TestA = sample(c('R','F','S','I'),100,T), TestB = sample(c('R','F','S','I'),100,T), TestC = sample(c('R','F','S','I'),100,T) )

# melt into long

dfm <- melt(df, id=c('Subject','Ref'))

# and plot

ggplot(dfm, aes(x=variable, fill=value)) + geom_bar() + facet_wrap(~Ref)

# which gives

在此处输入图像描述

# or bars dodged rather than stacked

ggplot(dfm, aes(x=variable, fill=value)) + geom_bar(position='dodge') + facet_wrap(~Ref)

在此处输入图像描述

如果@shujaa 下面说的是真的,这里有一个类似的主题图像,通过引用突出显示每个测试的真阳性率:

dfm <- transform(dfm, TP = value == Ref)

ggplot(dfm, aes(x=variable,fill=TP)) + geom_bar() + facet_wrap(~Ref)

在此处输入图像描述

或者在@shujaa 的最后评论之后,这是最后一次尝试:

ggplot(dfm, aes(x=variable,fill=TP)) + geom_bar() + facet_wrap(value~Ref)

在此处输入图像描述

于 2014-05-22T18:51:19.350 回答