-2

我正在尝试从 R 中的 CRAN 的 Cluster 包中将 a 转换data.framedaisy矩阵。我有一个包含 13109 个观测值的数据集,其中包含 9 个分类变量。

我得到了两种关于NAs 被强制引入的错误,并且没有丢失 min/max 的参数。为什么我会收到此错误?

NAdata.frame. 这是关于我的数据集的信息:

> str(df4)
'data.frame':   13109 obs. of  9 variables:
 $ Age               : chr  "55-64" "55-64" "55-64" "55-64" ...
 $ Gender            : chr  "Female" "Female" "Male" "Male" ...
 $ HouseholdIncome   : chr  "50k-75k" "150k-175k" "150k-175k" "150k-175k" ...
 $ MaritalStatus     : chr  "Single" "Married" "Married" "Married" ...
 $ PresenceofChildren: chr  "No" "Yes" "Yes" "Yes" ...
 $ HomeOwnerStatus   : chr  "Own" "Rent" "Rent" "Rent" ...
 $ HomeMarketValue   : chr  "350k-500k" "500k-1mm" "500k-1mm" "500k-1mm" ...
 $ Occupation        : chr  "White Collar Worker" "Professional" "Professional" "Professional" ...
 $ Education         : chr  "Completed High School" "Completed College" "Completed College" "Completed College" ...

这是强制值的证明NA:我尝试执行PAM聚类功能,但收到错误提示NA值不允许。

>library(cluster)
>#Create dissimilarity matrix
>#Gower coefficient for finding distance between mixed variable
>daisy4 <- daisy(df4, metric = "gower", type = list(ordratio = c(1:9)))

> warnings()
Warning messages:
1: In data.matrix(x) : NAs introduced by coercion
2: In data.matrix(x) : NAs introduced by coercion
3: In data.matrix(x) : NAs introduced by coercion
4: In data.matrix(x) : NAs introduced by coercion
5: In data.matrix(x) : NAs introduced by coercion
6: In data.matrix(x) : NAs introduced by coercion
7: In data.matrix(x) : NAs introduced by coercion
8: In data.matrix(x) : NAs introduced by coercion
9: In data.matrix(x) : NAs introduced by coercion
10: In min(x) : no non-missing arguments to min; returning Inf
11: In max(x) : no non-missing arguments to max; returning -Inf
12: In min(x) : no non-missing arguments to min; returning Inf
13: In max(x) : no non-missing arguments to max; returning -Inf
14: In min(x) : no non-missing arguments to min; returning Inf
15: In max(x) : no non-missing arguments to max; returning -Inf
16: In min(x) : no non-missing arguments to min; returning Inf
17: In max(x) : no non-missing arguments to max; returning -Inf
18: In min(x) : no non-missing arguments to min; returning Inf
19: In max(x) : no non-missing arguments to max; returning -Inf
20: In min(x) : no non-missing arguments to min; returning Inf
21: In max(x) : no non-missing arguments to max; returning -Inf
22: In min(x) : no non-missing arguments to min; returning Inf
23: In max(x) : no non-missing arguments to max; returning -Inf
24: In min(x) : no non-missing arguments to min; returning Inf
25: In max(x) : no non-missing arguments to max; returning -Inf
26: In min(x) : no non-missing arguments to min; returning Inf
27: In max(x) : no non-missing arguments to max; returning -Inf
28: In min(x) : no non-missing arguments to min; returning Inf
29: In max(x) : no non-missing arguments to max; returning -Inf

> k4answers <- pam(daisy4, 3, diss = TRUE)
Error in pam(daisy4, 3, diss = TRUE) : 
  NA values in the dissimilarity matrix not allowed.

如果我能提供更多信息,请告诉我。

编辑:我解决了我的错误。我在.csv文件中读取为character. 这就是它与其他数据集一起工作的原因。这是我出错的地方:

#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv", 
                   na.strings = "", stringsAsFactors=FALSE, head = TRUE)

解决方案:

#Load Data
    Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv", 
                       na.strings = "", head = TRUE)
4

1 回答 1

1

将数据作为因子变量而不是字符读取。

#Load Data
    Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv", 
                       na.strings = "", head = TRUE)

我之前有这个解决方案并产生了一个错误。

#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv", 
                   na.strings = "", stringsAsFactors=FALSE, head = TRUE)
于 2014-09-25T03:17:12.523 回答