r - 从 sf 线串对象创建具有正确索引标签的 inla.graph 对象

Question

我有一个包含linestrings描述巴西城市之间连接的 shapefile。我想将这些连接转换为将城市代码设置为行名的邻域对象，使其与我的数据框兼容：

> head(regic_link)
Simple feature collection with 6 features and 6 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -45.7931 ymin: -21.1472 xmax: -41.3903 ymax: -15.9032
proj4string:   +proj=longlat +ellps=GRS80 +no_defs 
     id origin_code            nome_ori dest_code        nome_dest   dist_km                       geometry
1 14016     3123304      Dores do Turvo   3165701  Senador Firmino 11.732462 LINESTRING (-43.1884 -20.97...
2 14117     3124708   Estrela do Indaiá   3166600 Serra da Saudade  9.080594 LINESTRING (-45.7878 -19.52...
3 15205     3138658              Lontra   3135357         Japonvar 11.104687 LINESTRING (-44.3029 -15.90...
4 17147     3163300  São José do Divino   3144904      Nova Módica 13.066889 LINESTRING (-41.3903 -18.48...
5 17151     3163409 São José do Goiabal   3121803         Dionísio 12.078370 LINESTRING (-42.7077 -19.92...
6 12463     3102100       Alto Rio Doce   3121506 Desterro do Melo 17.982702 LINESTRING (-43.412 -21.025...

因此，行名称将设置为origin_code，而邻居将dest_code在列表中设置为，反之亦然（最终这将更改为我创建的索引，但这会使事情更容易检查）。本质上，我需要linestring以下用于多边形的代码的等效项：

nb.orig <- poly2nb(as_Spatial(shp), row.names = shp$index)
names(nb.orig) <- attr(nb.orig, "region.id")

nb2INLA("output/nb_orig.graph",  nb.orig)

（shp是一个由多边形组成的 shapefile，并且该index变量同时存在于 shapefile 和数据框中）。

到目前为止，我已经使用sfnetworksandigraph包创建了一个邻域对象，但还不能将索引值附加到行名：

regic_net <- as_sfnetwork(regic_link, directed = F) 

net_adj <- as_adjacency_matrix(regic_net, names = T)
nb_net <- mat2listw(net_adj)$neighbours

默认情况下，该函数as_sfnetwork为网络中的每个节点分配一个数字（在边缘数据中，这些是变量 'from' 和 'to' 并且不能使用 mutate 进行更改）：

> regic_net
# A sfnetwork with 827 nodes and 3786 edges
#
# CRS:  NA 
#
# An undirected simple graph with 1 component with spatially explicit edges
#
# Edge Data:     3,786 x 7 (active)
# Geometry type: LINESTRING
# Dimension:     XY
# Bounding box:  xmin: -50.6938 ymin: -22.855 xmax: -39.9496 ymax: -14.2696
   from    to origin_code nome_ori           dest_code nome_dest                                                                                    geometry
  <int> <int>       <dbl> <chr>                  <dbl> <chr>                                                                                <LINESTRING [°]>
1     1     2     3123304 Dores do Turvo       3165701 Senador Firmino    (-43.1884 -20.975, -43.18106 -20.97017, -43.17373 -20.96534, -43.16639 -20.9605, …
2     3     4     3163409 São José do Goiab…   3121803 Dionísio           (-42.7077 -19.9255, -42.71344 -19.91851, -42.71917 -19.91152, -42.72491 -19.90453…
3     5     6     3167707 Sobrália             3162609 São João do Orien… (-42.0972 -19.2363, -42.1016 -19.2437, -42.106 -19.2511, -42.11039 -19.2585, -42.…
4     5     7     3167707 Sobrália             3168408 Tarumirim          (-42.0972 -19.2363, -42.08899 -19.24038, -42.08079 -19.24447, -42.07258 -19.24855…
5     8     9     3115474 Catuti               3141009 Mato Verde         (-42.9607 -15.3612, -42.95243 -15.36428, -42.94415 -15.36735, -42.93588 -15.37043…
6    10    11     3130606 Inconfidentes        3108305 Borda da Mata      (-46.328 -22.3174, -46.31896 -22.31505, -46.30993 -22.3127, -46.30089 -22.31034, …
# … with 3,780 more rows
#
# Node Data:     827 x 1
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: -50.6938 ymin: -22.855 xmax: -39.9496 ymax: -14.2696
             geometry
          <POINT [°]>
1  (-43.1884 -20.975)
2  (-43.1004 -20.917)
3 (-42.7077 -19.9255)
# … with 824 more rows

然后，这些值在使用和nb创建的对象的下一步中用作行名称。有谁知道如何附加我自己的索引甚至提取已创建的索引，以便我可以使其与数据框保持一致？我试图创建一个采用/和/但它们不匹配的转换表，该值是两个索引中的最低值，因此并不总是我数据中的来源。as_adjacency_matrixmat2listwfromorigin_codetodest_codefrom

更难的是，比预期多 57 个节点，我不知道为什么会发生这种情况。我检查了重复项，它们在数据中具有相同的坐标和代码/名称，但使用不同的索引号单独处理。

总之，我想将线串对象转换为可用于 INLA 模型的邻域对象，在该模型中，由线连接的城市被视为邻居。我需要为这些城市附加一个索引，以便邻域对象将索引设置为行名，并且它可以与模型中的数据结合。希望这是有道理的，如果没有，请告诉我！

score 2 · Accepted Answer

我将使用只有 5 的简化示例来提出解决方案LINESTRING(s)。如果以下代码对应于所需的解决方案，则应该很容易将其转换为实际案例。

首先，定义一些函数并加载相关包：

'%!in%' <- function(x,y) !('%in%'(x,y))
Sys.setenv(`_R_S3_METHOD_REGISTRATION_NOTE_OVERWRITES_` = "false")
suppressPackageStartupMessages({
  library(sf)
  library(lwgeom)
  library(tidygraph)
  library(sfnetworks)
})

加载数据并仅选择相关列+仅五行

regic_link <- st_read(
  dsn = "city_link/REGIC2018_Ligacoes_entre_Cidades.shp",
  query = "SELECT cod_ori, cod_dest FROM REGIC2018_Ligacoes_entre_Cidades LIMIT 5"
)
#> Reading query `SELECT cod_ori, cod_dest FROM REGIC2018_Ligacoes_entre_Cidades LIMIT 5' from data source `C:\Users\Utente\Desktop\city_link\REGIC2018_Ligacoes_entre_Cidades.shp' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 5 features and 2 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -63.0338 ymin: -13.4945 xmax: -60.1348 ymax: -9.912
#> Geodetic CRS:  SIRGAS 2000

过滤掉重复的连接

regic_link <- regic_link %>%
  filter(paste0(cod_ori, cod_dest) %!in% paste0(cod_dest, cod_ori))

构建 sfnetwork 对象

regic_sfn <- as_sfnetwork(regic_link, directed = FALSE)

提取节点（即这些线的唯一边界点）

regic_nodes <- st_as_sf(regic_sfn, "nodes")

每个节点都应对应于对象中定义的 LINESTRINGS 的一个边界点regic_link。正如您所注意到的，我们不能简单地使用 from/to 列，因为它们不保留起点和终点的顺序。因此，我们必须手动将节点与起点和终点进行匹配（假设每次它们具有相同的坐标，那么它们也具有相同的 ID）。此外，每个节点可能对应几个（相同的）边界点，但我们只需要取第一个匹配（即[1]下面的符号）：

start <- lapply(
  X = st_equals(regic_nodes, st_startpoint(regic_link)),
  FUN = function(x) regic_link$cod_ori[x[1]]
)

检查输出：

start
#> [[1]]
#> [1] "1100015"
#> 
#> [[2]]
#> [1] NA
#> 
#> [[3]]
#> [1] NA
#> 
#> [[4]]
#> [1] NA
#> 
#> [[5]]
#> [1] "1100023"
#> 
#> [[6]]
#> [1] NA
#> 
#> [[7]]
#> [1] "1100031"
#> 
#> [[8]]
#> [1] NA

这意味着第一个节点对应于 ID 1100015，第七个节点对应于 ID1100031等等。另一方面，其他节点不对应任何起点。对端点重复：

end <- lapply(
  X = st_equals(regic_nodes, st_endpoint(regic_link)),
  FUN = function(i) regic_link$cod_dest[i[1]]
)

检查输出，它应该start与相同的解释相反：

end
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] "1100049"
#> 
#> [[3]]
#> [1] "1100189"
#> 
#> [[4]]
#> [1] "1100296"
#> 
#> [[5]]
#> [1] NA
#> 
#> [[6]]
#> [1] "1100114"
#> 
#> [[7]]
#> [1] NA
#> 
#> [[8]]
#> [1] "1100304"

合并两个对象。dplyr::coalescence用于获取非 NA ID

idxs <- mapply(dplyr::coalesce, start, end)

将新的 id 添加到节点表中：

regic_sfn <- regic_sfn %N>%
  mutate(name = idxs)

估计城市之间的邻接矩阵

regic_adj <- igraph::as_adjacency_matrix(regic_sfn, names = TRUE)
regic_adj
#> 8 x 8 sparse Matrix of class "dgCMatrix"
#>         1100015 1100049 1100189 1100296 1100023 1100114 1100031 1100304
#> 1100015       .       1       1       1       .       .       .       .
#> 1100049       1       .       .       .       .       .       .       .
#> 1100189       1       .       .       .       .       .       .       .
#> 1100296       1       .       .       .       .       .       .       .
#> 1100023       .       .       .       .       .       1       .       .
#> 1100114       .       .       .       .       1       .       .       .
#> 1100031       .       .       .       .       .       .       .       1
#> 1100304       .       .       .       .       .       .       1       .

如果我们检查第一行，我们注意到城市与正确的列匹配：

regic_link %>% filter(cod_ori == 1100023)
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -63.0338 ymin: -10.4323 xmax: -62.4721 ymax: -9.912
#> Geodetic CRS:  SIRGAS 2000
#>   cod_ori cod_dest                       geometry
#> 1 1100023  1100114 LINESTRING (-63.0338 -9.912...

另一方面

regic_link %>% filter(cod_ori == 1100049)
#> Simple feature collection with 0 features and 2 fields
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> Geodetic CRS:  SIRGAS 2000
#> [1] cod_ori  cod_dest geometry
#> <0 rows> (or 0-length row.names)

因为那对应于其中之一cod_dest。

regic_link %>% filter(cod_dest == 1100049)
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -62.0041 ymin: -11.9342 xmax: -61.4512 ymax: -11.4356
#> Geodetic CRS:  SIRGAS 2000
#>   cod_ori cod_dest                       geometry
#> 1 1100015  1100049 LINESTRING (-62.0041 -11.93...

最后，我们可以将 adj 矩阵转换为inla.graph对象（但我认为这不是 INLA 函数严格要求的）：

INLA::inla.read.graph(regic_adj)
#> $n
#> [1] 8
#> 
#> $nnbs
#> [1] 3 1 1 1 1 1 1 1
#> 
#> $nbs
#> $nbs[[1]]
#> [1] 2 3 4
#> 
#> $nbs[[2]]
#> [1] 1
#> 
#> $nbs[[3]]
#> [1] 1
#> 
#> $nbs[[4]]
#> [1] 1
#> 
#> $nbs[[5]]
#> [1] 6
#> 
#> $nbs[[6]]
#> [1] 5
#> 
#> $nbs[[7]]
#> [1] 8
#> 
#> $nbs[[8]]
#> [1] 7
#> 
#> 
#> $graph.file
#> [1] NA
#> 
#> $cc
#> $cc$id
#> [1] 1 1 1 1 2 2 3 3
#> 
#> $cc$n
#> [1] 3
#> 
#> $cc$nodes
#> $cc$nodes[[1]]
#> [1] 1 2 3 4
#> 
#> $cc$nodes[[2]]
#> [1] 5 6
#> 
#> $cc$nodes[[3]]
#> [1] 7 8
#> 
#> 
#> 
#> attr(,"class")
#> [1] "inla.graph"

^{由reprex 包于 2021-08-25 创建 (v2.0.0 )}

希望有帮助。您可以检查LIMIT 5从查询参数中删除的完整示例st_read。可能有一些更简单的方法，但我现在想不出任何方法。

r - 从 sf 线串对象创建具有正确索引标签的 inla.graph 对象

1 回答 1

Related

Reference