1

考虑以下调查数据:

data <- replicate(10 ,sample(c(1,2,3,4), 1000, replace = TRUE)) %>%
  as.data.frame()

V1:V9是变量 其中,1 = "Good"2 = "Okey"是一个序数变量其中,3 = "Not Good"和.4 = "Don't know"V101 = "Good"2 = "Not good"3 = "Don't know"4 = "Don't want to answer"

我有兴趣使用cor()这些变量计算一个简单的相关矩阵。但是,我只想在具有实际意义的值之间进行计算。也就是说,1,2,3对于V1:V91,2对于V10

换句话说,我希望删除函数中任何值的大小写> 3,并且对于函数内的V1:V9任何值都相同。> 2V10cor()

这将类似于 use 参数?

我设法解决这个问题的唯一方法是将这些值更改为 NA。

library("dplyr")
data_test <- data_test %>%
      mutate_each(funs(ifelse(. > 3, NA, .)), -V10) %>%
      mutate(ifelse(V10 > 2, NA, V10))

cor(data_test, use = "complete.obs")

但是有没有更好的方法,不一定依赖于修改数据。

PS。当然,有更充分的方法来计算序数变量之间的相关性。

4

1 回答 1

0

The answer to this question was more simple than I thought.

As @zx8754 points out you should be careful when choosing correlation method for categorical variables.

Anyways, you just change use = "pairwise.complete.obs" in cor()

However, you still need to mutate 4 to NA.

于 2017-07-05T13:38:09.653 回答