0

我正在尝试根据客户的空间位置对客户数据进行聚类。这是我所做的,

#Reading the data
theData <- read.csv("Customer_Segmentation/data.csv")

#Subsetting only long, lat and record id.
inputdata <- data.frame(long=theData$LONG, lat=theData$LAT, RecordID=theData$RecordID)

#Building distance matrix
library(fossil)
d = earth.dist(inputdata, dist = TRUE) 

#Applying DBSCAN Clustering
library(fpc)
ds <- dbscan(d,eps = 0.5,MinPts = 50, method = "dist")

它给了我大约 23 个集群,

dbscan Pts=14873 MinPts=50 eps=0.5
      0   1  2  3   4   5  6  7  8  9   10  11 12  13  14 15  16   17 18 19 20  21  22  23
border 6546  73 47 38  20  53 60 27 70 19   93  43 58  25  21 31  36  492 47 44 41  43  55  35
seed      0 757 12 26  84  84  6 36  6 50 2132  70  2 101  91 55 104 2908 22 23 42  82  59 104
total  6546 830 59 64 104 137 66 63 76 69 2225 113 60 126 112 86 140 3400 69 67 83 125 114 139

第一个问题 --> 如何在地图中绘制这些集群?如果有人向我指出一些示例代码来绘制集群,那就太好了,我试图在新西兰地图上绘制它。我尝试下载坐标和变换如下,

library(sp)
library(rgdal)
nz1 <- getData("GADM", country = "NZ", level = 1)
nz1 <- spTransform(nz1, CRS = CRS("+init=epsg:2135"))

但是在我的 MAC 中出现这个错误,

Error in spTransform(nz1, CRS = CRS("+init=epsg:2135")) : 
  error in evaluating the argument 'CRSobj' in selecting a method for function 'spTransform': Error in CRS("+init=epsg:2135") : no system list, errno: 2

第二个问题,我在某处读到 k-means 不适合空间聚类,然后,我尝试使用层次聚类对其进行聚类,但它产生了一个大的树状图,更密集的树状图,因此无法从中获得任何信息。所以选择 DBSCAN 来做这件事。但是在这一个中,我可以看到许多点落在边界上,正如结果所暗示的那样。我确信每个集群中我需要大约 50-70 个客户。但是我应该选择什么 eps 值呢?这是我的示例数据。

      long       lat RecordID
1 174.9066 -41.20867       90 
2 174.9093 -41.22624       91 
3 174.8893 -41.21618       92 
4 174.8973 -41.21133       93
5 174.9153 -41.20419       94
6 174.9239 -41.20167       95 

按要求更新我的会话信息,

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] raster_2.3-40 rgdal_0.9-2   sp_1.1-0     

loaded via a namespace (and not attached):
[1] grid_3.1.2      lattice_0.20-29 tools_3.1.2   

根据要求更新库(rgdal)输出,

library(rgdal)
rgdal: version: 0.9-2, (SVN revision 526)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
Path to GDAL shared files: /usr/local/share/epsg_csv
Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
Path to PROJ.4 shared files: (autodetected)
Warning message:
package ‘rgdal’ was built under R version 3.1.3 

注意:-我已经明确提到我正在尝试绘制空间聚类输出并寻找选项,而我的选项之一出错了。还有一个问题要涵盖边界集群值。

4

1 回答 1

0

在我的机器上运行以下代码没有问题:

library(sp)
library(rgdal)
library(raster)
nz1 = getData("GADM", country = "NZ", level = 1) 
nz1 = spTransform(nz1, CRS = CRS("+init=epsg:2135"))

这是我的sessionInfo()

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] raster_2.3-40 rgdal_0.9-2   sp_1.0-17     dplyr_0.4.1   pROC_1.8     
[6] DBI_0.3.1    

loaded via a namespace (and not attached):
 [1] lazyeval_0.1.10 R6_2.0.1        plyr_1.8.1      magrittr_1.5   
 [5] assertthat_0.1  wakefield_0.2.0 parallel_3.2.0  tools_3.2.0    
 [9] Rcpp_0.11.4     grid_3.2.0      lattice_0.20-31

我很确定这是与系统相关的。我不使用地理空间数据,因此必须从头开始设置所有要求。

  1. 根据此处ppa:ubuntugis的建议设置最新版本的 GDAL 。
  2. 然后我安装了gdal-bin, libgdal1-dev& libproj-dev
  3. 我安装了 R 包rasterrgdal.

编辑:

根据@RobertH 的建议,添加rgdal包加载时间消息:

rgdal: version: 0.9-2, (SVN revision 526)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
Path to GDAL shared files: /usr/share/gdal/1.11
Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
Path to PROJ.4 shared files: (autodetected)
于 2015-05-09T09:56:40.333 回答