1

我正在为data.frame具有两个二进制变量和 13109 obs 的名为 Comp1 的独立性创建卡方检验。

在根据人口统计数据对消费者进行聚类之前,我正在使用该测试。如果这两个变量相互依赖,那么某些值将在一个簇中。这两个变量是另一个变量的子集,data.frame有 36 个变量。

我得到一个错误,说data.framecharacter变量而不是factors函数str()显示。

为什么错误说data.framecharacter值?

数据:

> str(Comp1)
'data.frame':   13109 obs. of  2 variables:
 $ HomeOwnerStatus: Factor w/ 2 levels "Own","Rent": 1 2 2 2 1 2 1 1 2 2 ...
 $ MaritalStatus  : Factor w/ 2 levels "Married","Single": 2 1 1 1 2 1 2 1 1 1 ...

例子:

> #Create dataset
> homeownerstatus <- c("Own", "Rent", "Own", "Own", "Rent", "Own")
> maritalstatus <- c("Married", "Married", "Married", "Single", "Single", "Married")
> Comp1 <- data.frame(homeownerstatus, maritalstatus)

解决方案错误:

> #Test binary variables for independence 
> #Create matrix from data.frame
> DF4 <- as.matrix(Comp1)
> #Comparison of marital status and home owner status
> #Perform chi-squared test for independence of two variables
> chisq.test(table(Comp1))

    Chi-squared test for given probabilities

data:  table(DF4)
X-squared = 295149.5, df = 71, p-value < 2.2e-16
4

1 回答 1

1

chisq.test 要么它的xy参数要么想要一个因子向量,matrix要么data.framex参数。当 a函数data.frame转换为 a。此步骤强制您的to 字符中的因子列matrixas.matrixdata.frame

> as.matrix(Comp1)
     homeownerstatus maritalstatus
[1,] "Own"           "Married"    
[2,] "Rent"          "Married"    
[3,] "Own"           "Married"    
[4,] "Own"           "Single"     
[5,] "Rent"          "Single"     
[6,] "Own"           "Married"

所以,我的建议是传递两个因子向量:

chisq.test(Comp1$homeownerstatus, Comp1$maritalstatus)

        Pearson's Chi-squared test with Yates' continuity correction

data:  Comp1$homeownerstatus and Comp1$maritalstatus
X-squared = 0, df = 1, p-value = 1

Warning message:
In chisq.test(Comp1$homeownerstatus, Comp1$maritalstatus) :
  Chi-squared approximation may be incorrect

编辑

当您将矩阵或 data.frame 传递给x参数时,该对象将被视为列联表,这不是您想要的。您有两个二进制变量,应计算其列联表,然后根据卡方检验进行测试。因此,您应该如上所述传递每个因子向量,或者,计算列联表并将其传递给chisq.test.

chisq.test(table(Comp1))
于 2014-11-04T09:36:08.857 回答