我正在分析这个数据集,它有数字和因子变量。我想知道相关性,所以我可以选择最好的变量。
str(data)
$ Ag : num [1:1470] 41 49 37 33 27 32 59 30 38 36 ...
$ Ay : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
$ Bu : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3 ...
$ Di : num [1:1470] 1 8 2 3 2 2 3 24 23 27 ...
$ Ed : num [1:1470] 2 1 2 4 1 2 3 1 3 3 ...
$ Ep : num [1:1470] 1 1 1 1 1 1 1 1 1 1 ...
$ Em : num [1:1470] 1 2 4 5 7 8 10 11 12 13 ...
$ Ge : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
$ Ho : num [1:1470] 94 61 92 56 40 79 81 67 44 94 ...
$ J1 : num [1:1470] 3 2 2 3 3 3 4 3 2 3 ...
$ J2 : num [1:1470] 2 2 1 1 1 1 1 1 3 2 ...
当我执行此操作时(尽管我想要所有数据的相关性不仅仅是数字):
cor(data[sapply(data, is.numeric)])
我返回此消息:
Warning message:
In cor(data[sapply(data, is.numeric)]) :
the standard deviation is zero