r - 您如何找到计算 r 时使用的样本量？

Question

我正在运行变量之间的相关性，其中一些变量缺少数据，因此每个相关性的样本量可能不同。我尝试了打印和摘要，但这些都没有显示我的 n 对于每个相关性有多大。这是一个相当简单的问题，我无法在任何地方找到答案。

score 3 · Accepted Answer

像这样..？

x <- c(1:100,NA)
length(x)
length(x[!is.na(x)])

你也可以像这样获得自由度......

y <- c(1:100,NA)
x <- c(1:100,NA)

cor.test(x,y)$parameter

但我认为最好显示代码以了解如何估计相关性以获得确切帮助。

score 1 · Accepted Answer

这是一个如何在矩阵的列中查找成对样本大小的示例。如果要将其应用于数据框的（某些）数字列，请相应地组合它们，将结果对象强制转换为矩阵并应用该函数。

# Example matrix:
xx <- rnorm(3000)
# Generate some NAs
vv <- sample(3000, 200)
xx[vv] <- NA
# reshape to a matrix
dd <- matrix(xx, ncol = 3)
# find the number of NAs per column
apply(dd, 2, function(x) sum(is.na(x)))
# tack on some column names
colnames(dd) <- paste0("x", seq(3))

# Function to find the number of pairwise complete observations 
# among all pairs of columns in a matrix. It returns a data frame
# whose first two columns comprise all column pairs

pairwiseN <- function(mat)
{
    u <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
    h <- expand.grid(x = u, y = u)

    f <- function(x, y)
           sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
    h$n <- mapply(f, h[, 1], h[, 2])
    h
}

# Call it
pairwiseN(dd)

功能很容易改进；例如，您可以设置h <- expand.grid(x = u[-1], y = u[-length(u)])减少计算次数，可以返回一个 nxn 矩阵而不是三列数据框等。

score -1 · Accepted Answer

-1

如果您的变量是名为aand的向量b，那么会有什么sum(is.na(a) | is.na(b))帮助吗？

于 2013-01-01T23:55:07.723 回答

r - 您如何找到计算 r 时使用的样本量？

3 回答 3

Related

Reference