我有两个分开的数据集。一个包含参与者的位置,另一个包含不同时间点的测量站位置和相应的值。下面我生成示例数据集。
# dataset of value
yearmon <- c("Jan 1996","Jan 1996","Jan 1996","Jan 1996","Jan 1996","Jan 1996",
"Feb 1996","Feb 1996","Feb 1996","Feb 1996","Feb 1996","Feb 1996",
"Mar 1996","Mar 1996","Mar 1996","Mar 1996","Mar 1996","Mar 1996",
"Apr 1996","Apr 1996","Apr 1996","Apr 1996","Apr 1996","Apr 1996",
"May 1996","May 1996","May 1996","May 1996","May 1996","May 1996",
"Jun 1996","Jun 1996","Jun 1996","Jun 1996","Jun 1996","Jun 1996")
lon <- c(114.1592, 114.1294, 114.1144, 114.0228, 113.9763, 113.9431)
lat <- c(22.35694, 22.31306, 22.33000, 22.37167, 22.37639, 22.45111)
STN <- c("A","B","C","D","E","F")
value <- runif(n=36, min=10, max=20)
df<- data.frame(STN,lon,lat)
df<- rbind(df,df,df,df,df,df)
df <- cbind(df,yearmon,value)
df$value[df$value < 12] <- NA
# dataset of participant location
id <- c(1,2,3,4)
lon.p <- c(114.3608, 114.1850, 114.1581, 114.1683)
lat.p <- c(22.44500, 22.33000, 22.28528, 22.37167)
participant <- data.frame(id,lon.p,lat.p)
#
样本数据集如下。我想计算每个时间点(yearmon)每个站点(AF)和每个参与者(1-4)之间的距离。并将特定时间点的值分配给特定的参与者。我无法先将参与者分配到一个站点,因为站点的位置可能会在不同的时间点发生变化(尽管它在示例数据集中没有变化)
即如果参与者 1 在 1996 年 1 月住在离 A 站最近的地方,那么他/她应该被分配值 17.03357。
我更喜欢大圆距离,也许使用这样的脚本计算:rdist.earth(location1, location2,miles=FALSE, R=6371)
head(df,10)
STN lon lat yearmon value
1 A 114.1592 22.35694 Jan 1996 17.03357
2 B 114.1294 22.31306 Jan 1996 NA
3 C 114.1144 22.33000 Jan 1996 17.98293
4 D 114.0228 22.37167 Jan 1996 15.98854
5 E 113.9763 22.37639 Jan 1996 16.78647
6 F 113.9431 22.45111 Jan 1996 18.89551
7 A 114.1592 22.35694 Feb 1996 NA
8 B 114.1294 22.31306 Feb 1996 19.90123
9 C 114.1144 22.33000 Feb 1996 17.88482
10 D 114.0228 22.37167 Feb 1996 13.80029
participant
id lon.p lat.p
1 1 114.3608 22.44500
2 2 114.1850 22.33000
3 3 114.1581 22.28528
4 4 114.1683 22.37167
最后,我想这就是我想要回归的东西。(但填写的值)
id lon.p lat.p Apr 1996 Feb 1996 Jan 1996 Jun 1996 Mar 1996 May 1996
1 1 114.3608 22.44500
2 2 114.1850 22.33000
3 3 114.1581 22.28528
4 4 114.1683 22.37167
谢谢你。