r - 在不规则网格上绘制数据的有效方法

Question

我使用在不规则二维网格上组织的卫星数据，其尺寸是扫描线（沿轨道尺寸）和地面像素（跨轨道尺寸）。每个中心像素的经纬度信息存储在辅助坐标变量中，以及四个角坐标对（经纬度坐标在 WGS84 参考椭球上给出）。数据存储在 netCDF4 文件中。

我正在尝试做的是在投影地图上有效地绘制这些文件（可能还有文件的组合——下一步！）。

到目前为止，我的方法受到Jeremy Voisey对这个问题的回答的启发，一直是构建一个数据框，将我感兴趣的变量与像素边界联系起来，并ggplot2用于geom_polygon实际绘图。

让我说明一下我的工作流程，并为这种幼稚的方法提前道歉：我刚开始使用 R 编码一两周。

笔记

要完全重现问题：
1. 下载两个数据帧：so2df.Rda (22M) 和pixel_corners.Rda (26M)
2. 将它们加载到您的环境中，例如

so2df <- readRDS(file="so2df.Rda")
pixel_corners <- readRDS(file="pixel_corners.Rda")

跳转到“合并数据框”步骤。

最初设定

我要从我的文件中读取数据和纬度/经度边界。

library(ncdf4)
library(ggplot2)
library(ggmap) 
# set path and filename
ncpath <- "/Users/stefano/src/s5p/products/e1dataset/L2__SO2/"
ncname <- "S5P_OFFL_L2__SO2____20171128T234133_20171129T003956_00661_01_022943_00000000T000000"  
ncfname <- paste(ncpath, ncname, ".nc", sep="")
nc <- nc_open(ncfname)

# save fill value and multiplication factors
mfactor = ncatt_get(nc, "PRODUCT/sulfurdioxide_total_vertical_column", 
                    "multiplication_factor_to_convert_to_DU")
fillvalue = ncatt_get(nc, "PRODUCT/sulfurdioxide_total_vertical_column", 
                      "_FillValue")

# read the SO2 total column variable
so2tc <- ncvar_get(nc, "PRODUCT/sulfurdioxide_total_vertical_column")

# read lat/lon of centre pixels
lat <- ncvar_get(nc, "PRODUCT/latitude")
lon <- ncvar_get(nc, "PRODUCT/longitude")

# read latitude and longitude bounds
lat_bounds <- ncvar_get(nc, "GEOLOCATIONS/latitude_bounds")
lon_bounds <- ncvar_get(nc, "GEOLOCATIONS/longitude_bounds")

# close the file
nc_close(nc)
dim(so2tc)
## [1]  450 3244

因此，对于这个文件/通道，3244 条扫描线中的每一条都有 450 个地面像素。

创建数据框

在这里，我创建了两个数据框，一个用于值，进行了一些后处理，一个用于纬度/经度边界，然后合并这两个数据框。

so2df <- data.frame(lat=as.vector(lat), lon=as.vector(lon), so2tc=as.vector(so2tc))
# add id for each pixel
so2df$id <- row.names(so2df)
# convert to DU
so2df$so2tc <- so2df$so2tc*as.numeric(mfactor$value)
# replace fill values with NA
so2df$so2tc[so2df$so2tc == fillvalue] <- NA
saveRDS(so2df, file="so2df.Rda")
summary(so2df)

##       lat              lon              so2tc              id           
##  Min.   :-89.97   Min.   :-180.00   Min.   :-821.33   Length:1459800    
##  1st Qu.:-62.29   1st Qu.:-163.30   1st Qu.:  -0.48   Class :character  
##  Median :-19.86   Median :-150.46   Median :  -0.08   Mode  :character  
##  Mean   :-13.87   Mean   : -90.72   Mean   :  -1.43                     
##  3rd Qu.: 31.26   3rd Qu.: -27.06   3rd Qu.:   0.26                     
##  Max.   : 83.37   Max.   : 180.00   Max.   :3015.55                     
##                                     NA's   :200864

我将此数据框保存为so2df.Rda 此处（22M）。

num_points = dim(lat_bounds)[1]
pixel_corners <- data.frame(lat_bounds=as.vector(lat_bounds), lon_bounds=as.vector(lon_bounds))
# create id column by replicating pixel's id for each of the 4 corner points
pixel_corners$id <- rep(so2df$id, each=num_points)
saveRDS(pixel_corners, file="pixel_corners.Rda")
summary(pixel_corners)


##    lat_bounds       lon_bounds           id           
##  Min.   :-89.96   Min.   :-180.00   Length:5839200    
##  1st Qu.:-62.29   1st Qu.:-163.30   Class :character  
##  Median :-19.86   Median :-150.46   Mode  :character  
##  Mean   :-13.87   Mean   : -90.72                     
##  3rd Qu.: 31.26   3rd Qu.: -27.06                     
##  Max.   : 83.40   Max.   : 180.00

正如预期的那样，纬度/经度边界数据帧是值数据帧的四倍（每个像素/值四个点）。
我将此数据框保存为pixel_corners.Rda 此处（26M）。

合并数据框

然后我按 id 合并两个数据框：

start_time <- Sys.time()
so2df <- merge(pixel_corners, so2df, by=c("id"))
time_taken <- Sys.time() - start_time
print(paste(dim(so2df)[1], "rows merged in", time_taken, "seconds"))

## [1] "5839200 rows merged in 42.4763631820679 seconds"

如您所见，这是一个 CPU 密集型进程。我想知道如果我一次处理 15 个文件会发生什么（全球覆盖）。

绘制数据

现在我已经将像素角与像素值相关联，我可以轻松地绘制它们。通常，我对轨道的特定区域感兴趣，所以我制作了一个函数，在绘制输入数据帧之前对其进行子集化：

PlotRegion <- function(so2df, latlon, title) {
  # Plot the given dataset over a geographic region.
  #
  # Args:
  #   df: The dataset, should include the no2tc, lat, lon columns
  #   latlon: A vector of four values identifying the botton-left and top-right corners 
  #           c(latmin, latmax, lonmin, lonmax)
  #   title: The plot title

  # subset the data frame first
  df_sub <- subset(so2df, lat>latlon[1] & lat<latlon[2] & lon>latlon[3] & lon<latlon[4])

  subtitle = paste("#Pixel =", dim(df_sub)[1], "- Data min =", 
                   formatC(min(df_sub$so2tc, na.rm=T), format="e", digits=2), "max =", 
                   formatC(max(df_sub$so2tc, na.rm=T), format="e", digits=2))

  ggplot(df_sub) + 
    geom_polygon(aes(y=lat_bounds, x=lon_bounds, fill=so2tc, group=id), alpha=0.8) +
    borders('world', xlim=range(df_sub$lon), ylim=range(df_sub$lat), 
            colour='gray20', size=.2) + 
    theme_light() + 
    theme(panel.ontop=TRUE, panel.background=element_blank()) +
    scale_fill_distiller(palette='Spectral') +
    coord_quickmap(xlim=c(latlon[3], latlon[4]), ylim=c(latlon[1], latlon[2])) +
    labs(title=title, subtitle=subtitle, 
         x="Longitude", y="Latitude", 
         fill=expression(DU)) 
}

然后我在感兴趣的区域上调用我的函数，例如让我们看看夏威夷发生了什么：

latlon = c(17.5, 22.5, -160, -154)
PlotRegion(so2df, latlon, expression(SO[2]~total~vertical~column))

它们在那里，我的像素，以及似乎是来自莫纳罗亚的 SO2 羽流。请暂时忽略负值。如您所见，像素的区域向着条带的边缘变化（不同的分箱方案）。

我尝试使用 ggmap 在谷歌地图上显示相同的情节：

PlotRegionMap <- function(so2df, latlon, title) {
  # Plot the given dataset over a geographic region.
  #
  # Args:
  #   df: The dataset, should include the no2tc, lat, lon columns
  #   latlon: A vector of four values identifying the botton-left and top-right corners 
  #           c(latmin, latmax, lonmin, lonmax)
  #   title: The plot title

  # subset the data frame first
  df_sub <- subset(so2df, lat>latlon[1] & lat<latlon[2] & lon>latlon[3] & lon<latlon[4])

  subtitle = paste("#Pixel =", dim(df_sub)[1], "Data min =", formatC(min(df_sub$so2tc, na.rm=T), format="e", digits=2), 
                   "max =", formatC(max(df_sub$so2tc, na.rm=T), format="e", digits=2))
  base_map <- get_map(location = c(lon = (latlon[4]+latlon[3])/2, lat = (latlon[1]+latlon[2])/2), zoom = 7, maptype="terrain", color="bw")

  ggmap(base_map, extent = "normal")  +
    geom_polygon(data=df_sub, aes(y=lat_bounds, x=lon_bounds,fill=so2tc, group=id),  alpha=0.5) +
    theme_light() + 
    theme(panel.ontop=TRUE, panel.background=element_blank()) +
    scale_fill_distiller(palette='Spectral') +
    coord_quickmap(xlim=c(latlon[3], latlon[4]), ylim=c(latlon[1], latlon[2])) +
    labs(title=title, subtitle=subtitle, 
         x="Longitude", y="Latitude", 
         fill=expression(DU)) 

}

这就是我得到的：

latlon = c(17.5, 22.5, -160, -154)
PlotRegionMap(so2df, latlon, expression(SO[2]~total~vertical~column))

问题

有没有更有效的方法来解决这个问题？我正在阅读这个sf包，我想知道我是否可以定义一个点数据框（值 + 中心像素坐标），并sf自动推断像素边界。这将使我不必依赖原始数据集中定义的纬度/经度边界，也不必将它们与我的值合并。我可以接受在朝向条带边缘的过渡区域的精度损失，否则网格非常规则，每个像素为 3.5x7 km^2 大。
将我的数据重新网格化到常规网格（如何？），可能通过聚合相邻像素来提高性能？我正在考虑使用该raster包，据我所知，它需要常规网格上的数据。这在全球范围内应该很有用（例如欧洲的地块），我不需要绘制单个像素——事实上，我什至看不到它们。
在谷歌地图上绘图时是否需要重新投影我的数据？

[奖金美容问题]

有没有更优雅的方法在由四个角点标识的区域上对我的数据框进行子集化？
如何更改色阶以使较高的值相对于较低的值突出？我经历过结果不佳的对数刻度。

score 2 · Accepted Answer

我认为data.table在这里可能会有所帮助。合并几乎是瞬间的。

“5839200 行在 1.24507117271423 秒内合并”

library(data.table)
pixel_cornersDT <- as.data.table(pixel_corners)
so2dfDT <- as.data.table(so2df)

setkey(pixel_cornersDT, id)
setkey(so2dfDT, id)

so2dfDT <- merge(pixel_cornersDT, so2dfDT, by=c("id"), all = TRUE)

有了 a 中的数据，data.table绘图函数中的子集也会快得多。

问题 1 / 2 / 4：

raster我不认为使用or的过程会更快，sf但是您可以尝试使用rasterFromXYZ()or的功能st_make_grid()。但大部分时间将花在转换为栅格/sf 对象上，因为您必须转换整个数据集。

我建议进行data.table包括裁剪在内的所有数据处理，然后您可以从那里切换到光栅/SF 对象以进行绘图。

问题 3：

谷歌绘图正确显示，但您指定了黑白地图，并且它与“光栅”重叠，因此您不会看到很多。您可以将底图更改为卫星背景。

base_map <- get_map(location = c(lon = (latlon[4]+latlon[3])/2, lat = (latlon[1]+latlon[2])/2), 
                    zoom = 7, maptype="satellite")

问题 5：

您可以使用包中的rescale功能scales。我在下面包括了两个选项。第一个（未注释）将分位数作为中断，其他中断是单独定义的。我不会trans像创建 NA 值那样使用对数转换（ - 参数），因为您也有负值。

ggplot(df_sub) + 
  geom_polygon(aes(y=lat_bounds, x=lon_bounds, fill=so2tc, group=id), alpha=0.8) +
  borders('world', xlim=range(df_sub$lon), ylim=range(df_sub$lat),
          colour='gray20', size=.2) +
  theme_light() + 
  theme(panel.ontop=TRUE, panel.background=element_blank()) +
  # scale_fill_distiller(palette='Spectral', type="seq", trans = "log2") +
  scale_fill_distiller(palette = "Spectral",
                       # values = scales::rescale(quantile(df_sub$so2tc), c(0,1))) +
                       values = scales::rescale(c(-3,0,1,5), c(0,1))) +
  coord_quickmap(xlim=c(latlon[3], latlon[4]), ylim=c(latlon[1], latlon[2])) +
  labs(title=title, subtitle=subtitle, 
       x="Longitude", y="Latitude", 
       fill=expression(DU))

整个过程对我来说现在大约需要8 秒，包括没有背景地图的绘图，尽管地图渲染也需要额外的 1-2 秒。

r - 在不规则网格上绘制数据的有效方法

最初设定

创建数据框

合并数据框

绘制数据

问题

1 回答 1

Related

Reference