r - PCA Biplot：一种隐藏向量以清晰查看所有数据点的方法

Question

我正在尝试用 R 做 PCA。

我的数据有 10,000 列和 90 行我使用 prcomp 函数进行 PCA。尝试使用 prcomp 结果准备双标图时，我遇到了 10,000 个绘制向量覆盖我的数据点的问题。双标图是否有任何选项可以隐藏向量的表示？

或者

我可以plot用来获取 PCA 结果。但我不确定如何根据编号为 1 到 90 的数据点标记这些点。

Sample<-read.table(file.choose(),header=F,sep="\t")

Sample.scaled<-data.frame(apply(Sample_2XY,2,scale))

Sample_scaled.2<-data.frame(t(na.omit(t(Sample_2XY.scaled))))

pca.Sample<-prcomp(Sample_2XY.scaled.2,retx=TRUE)

pdf("Sample_plot.pdf")

plot(pca.Sample$x)

dev.off()

score 8 · Accepted Answer

如果执行help(prcomp)or ?prcomp，帮助文件会告诉我们prcomp()函数返回的对象中包含的所有内容。我们只需要选择我们想要绘制的东西，并使用一些比biplot().

对于帮助文件没有说明问题的情况，一个更通用的技巧是str()对 prcomp 对象（在您的情况下为 pca.Sample）执行 a 以查看其所有部分并找到我们想要的（str()紧凑地显示 R 的内部结构目的。）

以下是 R 的一些示例数据的示例：

# do a pca of arrests in different states
p<-prcomp(USArrests, scale = TRUE)

str(p)给了我一些丑陋且太长的东西，但我可以看到 p$x 将状态作为行名，并将它们在主成分上的位置作为列。有了这个，我们可以用任何我们想要的方式来绘制它，比如用plot()and text()（用于标签）：

# plot and add labels
plot(p$x[,1],p$x[,2])
text(p$x[,1],p$x[,2],labels=rownames(p$x))

如果我们正在制作包含许多观察值的散点图，则标签可能不可读。因此，我们可能只想标记更多的极端值，我们可以用来识别quantile()：

#make a new dataframe with the info from p we want to plot
df <- data.frame(PC1=p$x[,1],PC2=p$x[,2],labels=rownames(p$x))

#make sure labels are not factors, so we can easily reassign them
df$labels <- as.character(df$labels)

# use quantile() to identify which ones are within 25-75 percentile on both
# PC and blank their labels out
df[ df$PC1 > quantile(df$PC1)["25%"] & 
    df$PC1 < quantile(df$PC1)["75%"] &
    df$PC2 > quantile(df$PC2)["25%"] &
    df$PC2 < quantile(df$PC2)["75%"],]$labels <- ""

# plot
plot(df$PC1,df$PC2)
text(df$PC1,df$PC2,labels=df$labels)

r - PCA Biplot：一种隐藏向量以清晰查看所有数据点的方法

1 回答 1

Related

Reference