r - 将大向量拆分为R中的间隔

Question

我对 R 不太好。我运行了这个循环，得到了 11,303,044 行的巨大结果向量。我有另一个向量来自另一个循环，尺寸为 1681 行。

我希望运行一个chisq.test来比较它们的分布。但由于它们的长度不同，所以它不起作用。

我尝试从 11,303,044 大小的向量中提取 1681 个大小的样本以匹配第二个向量的大小长度，但chisq.test每次运行它都会得到不同的结果。

我正在考虑将 2 个向量分成相等数量的间隔。

比方说

矢量1：

temp.mat<-matrix((rnorm(11303044))^2, ncol=1) 
head(temp.mat)
dim(temp.mat)

矢量2：

temp.mat<-matrix((rnorm(1681))^2, ncol=1) 
head(temp.mat)
dim(temp.mat)

如何将它们以相等的间隔拆分以产生相同长度的向量？

score 1 · Accepted Answer

mat1<-matrix((rnorm(1130300))^2, ncol=1) # only one-tenth the size of your vector
smat=sample(mat1, 100000)                #and take only one-tenth of that
mat2<-matrix((rnorm(1681))^2, ncol=1)
qqplot(smat,mat2)                       #and repeat the sampling a few times

从统计的角度来看，您所看到的似乎很有趣。在“偏离均值”的较高水平上，大样本总是偏离“良好拟合”，这并不奇怪，因为它具有更多数量的真正极值。

score 0 · Accepted Answer

chisq.test是皮尔逊卡方检验。它是为离散数据设计的，并且有两个输入向量，它将强制您传递给因子的输入，并测试独立性，而不是分布的相等性。例如，这意味着数据的顺序会有所不同。

> set.seed(123)
> x<-sample(5,10,T)
> y<-sample(5,10,T)
> chisq.test(x,y)

    Pearson's Chi-squared test

data:  x and y
X-squared = 18.3333, df = 16, p-value = 0.3047

Warning message:
In chisq.test(x, y) : Chi-squared approximation may be incorrect
> chisq.test(x,y[10:1])

    Pearson's Chi-squared test

data:  x and y[10:1]
X-squared = 16.5278, df = 16, p-value = 0.4168

Warning message:
In chisq.test(x, y[10:1]) : Chi-squared approximation may be incorrect

所以我认为这不是chisq.test你想要的，因为它不比较分布。也许尝试类似的东西ks.test，这将适用于不同长度的向量和连续数据。

> set.seed(123)
> x<-rnorm(2000)^2
> y<-rnorm(100000)^2
> ks.test(x,y)

    Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.0139, p-value = 0.8425
alternative hypothesis: two-sided

> ks.test(sqrt(x),y)

    Two-sample Kolmogorov-Smirnov test

data:  sqrt(x) and y
D = 0.1847, p-value < 2.2e-16
alternative hypothesis: two-sided

r - 将大向量拆分为R中的间隔

2 回答 2

Related

Reference