r - 在 R 中为地理邻近分析重构数据

Question

我有一个人们的地理坐标数据集，如下所示：

Person  Latitude    Longitude
  1     46.0614     -23.9386
  2     48.1792      63.1136
  3     59.9289      66.3883
  4     42.8167      58.3167
  5     43.1167      63.25

我计划使用 R 中的 geosphere 包计算二元级别的地理接近度。为了实现这一点，我需要创建一个如下所示的数据集：

Person1 Person2 LatitudeP1  LongitudeP1 LatitudeP2  LongitudeP2
   1       2     46.0614    -23.9386     48.1792     63.1136
   1       3     46.0614    -23.9386     59.9289     66.3883
   1       4     46.0614    -23.9386     42.8167     58.3167
   1       5     46.0614    -23.9386     43.1167     63.25
   2       3     48.1792     63.1136     59.9289     66.3883
   2       4     48.1792     63.1136     42.8167     58.3167
   2       5     48.1792     63.1136     43.1167     63.25
   3       4     59.9289     66.3883     42.8167     58.3167
   3       5     59.9289     66.3883     43.1167     63.25
   4       5     42.8167     58.3167     43.1167     63.25

因此，结果数据对于数据集中的每个可能的二元组都有一行，并且包括二元组中两个个体的坐标。“LatitudeP1”和“LongitudeP1”是“Person1”在成对中的坐标，“LatitudeP2”和“LongitudeP2”是“Person2”在成对中的坐标。此外，哪个 ID 列为 Person1 还是 Person2 并不重要，因为地理距离不是有向关系。

score 2 · Accepted Answer

Just taking the possible combinations (combn) of Person 1 thru 5, and subsetting the Lat/long from your original data:

dat <- read.table(header = TRUE, text="Person  Latitude    Longitude
1     46.0614     -23.9386
2     48.1792      63.1136
3     59.9289      66.3883
4     42.8167      58.3167
5     43.1167      63.25")

tmp <- t(combn(nrow(dat),2))

#      [,1] [,2]
# [1,]    1    2
# [2,]    1    3
# [3,]    1    4
# [4,]    1    5
# [5,]    2    3
# [6,]    2    4
# [7,]    2    5
# [8,]    3    4
# [9,]    3    5
# [10,]    4    5

res <- cbind(tmp,
             do.call('cbind', lapply(1:2, function(x) 
               mapply(`[`, dat[, 2:3], MoreArgs = list(i=tmp[, x])))))
colnames(res) <- c('Person1','Person2','LatitudeP1','LongitudeP1',
                   'LatitudeP2','LongitudeP2')

data.frame(res)

#    Person1 Person2 LatitudeP1 LongitudeP1 LatitudeP2 LongitudeP2
# 1        1       2    46.0614    -23.9386    48.1792     63.1136
# 2        1       3    46.0614    -23.9386    59.9289     66.3883
# 3        1       4    46.0614    -23.9386    42.8167     58.3167
# 4        1       5    46.0614    -23.9386    43.1167     63.2500
# 5        2       3    48.1792     63.1136    59.9289     66.3883
# 6        2       4    48.1792     63.1136    42.8167     58.3167
# 7        2       5    48.1792     63.1136    43.1167     63.2500
# 8        3       4    59.9289     66.3883    42.8167     58.3167
# 9        3       5    59.9289     66.3883    43.1167     63.2500
# 10       4       5    42.8167     58.3167    43.1167     63.2500

score 1 · Accepted Answer

如果你想要成对的距离，并且你正在使用 package geosphere，为什么不使用distm(...)而不是跳过所有这些火热的箍：

# df is the dataset from your question
library(geosphere)
distm(df[,3:2],fun=distHaversine)   # distance in *meters*
#         [,1]      [,2]    [,3]      [,4]      [,5]
# [1,]       0 6224407.2 5743824 6243068.1 6553157.4
# [2,] 6224407       0.0 1324950  704260.1  563654.6
# [3,] 5743824 1324949.8       0 1982326.1 1883584.1
# [4,] 6243068  704260.1 1982326       0.0  403183.0
# [5,] 6553157  563654.6 1883584  403183.0       0.0

您也可以使用该fossil软件包。

library(fossil)
earth.dist(df[,3:2],dist=FALSE)     # distance in *kilometers*
#          [,1]      [,2]     [,3]      [,4]      [,5]
# [1,]    0.000 6219.1967 5739.016 6237.8420 6547.6718
# [2,] 6219.197    0.0000 1323.841  703.6706  563.1828
# [3,] 5739.016 1323.8407    0.000 1980.6667 1882.0073
# [4,] 6237.842  703.6706 1980.667    0.0000  402.8455
# [5,] 6547.672  563.1828 1882.007  402.8455    0.0000

请注意，这些函数需要经度，然后是纬度，因此您必须通过 cols 3:2，而不是 2:3。

编辑对 OP 评论的回应。

“边缘列表”听起来像是你想以一个igraph对象结束。您可以将距离矩阵用作中的邻接矩阵igraph，并且距离将自动填充边缘列表上的权重。

library(igraph)
library(geosphere)
g <- graph.adjacency(distm(df[,3:2],fun=distHaversine),
                     mode="undirected",weighted=TRUE)
set.seed(1)   # for reproducible plot
plot(g, layout=layout.fruchterman.reingold(g,weights=E(g)$weight))

get.data.frame(g,"edges")
#    from to    weight
# 1     1  2 6224407.2
# 2     1  3 5743824.5
# 3     1  4 6243068.1
# 4     1  5 6553157.4
# 5     2  3 1324949.8
# 6     2  4  704260.1
# 7     2  5  563654.6
# 8     3  4 1982326.1
# 9     3  5 1883584.1
# 10    4  5  403183.0

r - 在 R 中为地理邻近分析重构数据

2 回答 2

Related

Reference