我正在尝试根据客户的空间位置对客户数据进行聚类。这是我所做的,
#Reading the data
theData <- read.csv("Customer_Segmentation/data.csv")
#Subsetting only long, lat and record id.
inputdata <- data.frame(long=theData$LONG, lat=theData$LAT, RecordID=theData$RecordID)
#Building distance matrix
library(fossil)
d = earth.dist(inputdata, dist = TRUE)
#Applying DBSCAN Clustering
library(fpc)
ds <- dbscan(d,eps = 0.5,MinPts = 50, method = "dist")
它给了我大约 23 个集群,
dbscan Pts=14873 MinPts=50 eps=0.5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
border 6546 73 47 38 20 53 60 27 70 19 93 43 58 25 21 31 36 492 47 44 41 43 55 35
seed 0 757 12 26 84 84 6 36 6 50 2132 70 2 101 91 55 104 2908 22 23 42 82 59 104
total 6546 830 59 64 104 137 66 63 76 69 2225 113 60 126 112 86 140 3400 69 67 83 125 114 139
第一个问题 --> 如何在地图中绘制这些集群?如果有人向我指出一些示例代码来绘制集群,那就太好了,我试图在新西兰地图上绘制它。我尝试下载坐标和变换如下,
library(sp)
library(rgdal)
nz1 <- getData("GADM", country = "NZ", level = 1)
nz1 <- spTransform(nz1, CRS = CRS("+init=epsg:2135"))
但是在我的 MAC 中出现这个错误,
Error in spTransform(nz1, CRS = CRS("+init=epsg:2135")) :
error in evaluating the argument 'CRSobj' in selecting a method for function 'spTransform': Error in CRS("+init=epsg:2135") : no system list, errno: 2
第二个问题,我在某处读到 k-means 不适合空间聚类,然后,我尝试使用层次聚类对其进行聚类,但它产生了一个大的树状图,更密集的树状图,因此无法从中获得任何信息。所以选择 DBSCAN 来做这件事。但是在这一个中,我可以看到许多点落在边界上,正如结果所暗示的那样。我确信每个集群中我需要大约 50-70 个客户。但是我应该选择什么 eps 值呢?这是我的示例数据。
long lat RecordID
1 174.9066 -41.20867 90
2 174.9093 -41.22624 91
3 174.8893 -41.21618 92
4 174.8973 -41.21133 93
5 174.9153 -41.20419 94
6 174.9239 -41.20167 95
按要求更新我的会话信息,
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] raster_2.3-40 rgdal_0.9-2 sp_1.1-0
loaded via a namespace (and not attached):
[1] grid_3.1.2 lattice_0.20-29 tools_3.1.2
根据要求更新库(rgdal)输出,
library(rgdal)
rgdal: version: 0.9-2, (SVN revision 526)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
Path to GDAL shared files: /usr/local/share/epsg_csv
Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
Path to PROJ.4 shared files: (autodetected)
Warning message:
package ‘rgdal’ was built under R version 3.1.3
注意:-我已经明确提到我正在尝试绘制空间聚类输出并寻找选项,而我的选项之一出错了。还有一个问题要涵盖边界集群值。