2

我有一个包含 X/Y 位置列表(> 2000 行)的数据框。我想要的是根据最大距离选择或查找所有行/位置。例如,从数据框中选择彼此相距 1-100 公里的所有位置。关于如何做到这一点的任何建议?

4

2 回答 2

2

您需要以某种方式确定每对行之间的距离。最简单的方法是使用相应的距离矩阵

# Assuming Thresh is your threshold
thresh <- 10

# create some sample data
set.seed(123)
DT <- data.table(X=sample(-10:10, 5, TRUE), Y=sample(-10:10, 5, TRUE))

# create the disance matrix
distTable <- matrix(apply(createTable(DT), 1, distance), nrow=nrow(DT))

# remove the lower.triangle since we have symmetry (we don't want duplicates)
distTable[lower.tri(distTable)] <- NA

# Show which rows are above the threshold
pairedRows <- which(distTable >= thresh, arr.ind=TRUE)
colnames(pairedRows) <- c("RowA", "RowB")  # clean up the names

从...开始:

> DT
    X   Y
1: -4 -10
2:  6   1
3: -2   8
4:  8   1
5:  9  -1

我们得到:

> pairedRows
     RowA RowB
[1,]    1    2
[2,]    1    3
[3,]    2    3
[4,]    1    4
[5,]    3    4
[6,]    1    5
[7,]    3    5

这些是用于创建距离矩阵的两个函数

# pair-up all of the rows
createTable <- function(DT)   
  expand.grid(apply(DT, 1, list), apply(DT, 1, list))

# simple cartesian/pythagorean distance 
distance <- function(CoordPair)
  sqrt(sum((CoordPair[[2]][[1]] - CoordPair[[1]][[1]])^2, na.rm=FALSE))
于 2013-05-12T00:00:35.450 回答
1

我对您的问题并不完全清楚,但假设您的意思是要获取每一行坐标并找到坐标在一定距离内的所有其他行:

# Create data set for example

set.seed(42)
x <- sample(-100:100, 10)
set.seed(456)
y <- sample(-100:100, 10)

coords <- data.frame(
  "x" = x,
  "y" = y)

# Loop through all rows

lapply(1:nrow(coords), function(i) {
  dis <- sqrt(
    (coords[i,"x"] - coords[, "x"])^2 + # insert your preferred 
    (coords[i,"y"] - coords[, "y"])^2   # distance calculation here
  ) 
  names(dis) <- 1:nrow(coords)          # replace this part with an index or 
                                        # row names if you have them
  dis[dis > 0 & dis <= 100]             # change numbers to preferred threshold
})

[[1]]
2        6        7        9       10 
25.31798 95.01579 40.01250 30.87070 73.75636 

[[2]]
1         6         7         9        10 
25.317978 89.022469 51.107729  9.486833 60.539243 

[[3]]
5        6        8 
70.71068 91.78780 94.86833 

[[4]]
5       10 
40.16217 99.32774 

[[5]]
3        4        6       10 
70.71068 40.16217 93.40771 82.49242 

[[6]]
1        2        3        5        7        8        9       10 
95.01579 89.02247 91.78780 93.40771 64.53681 75.66373 97.08244 34.92850 

[[7]]
1        2        6        9       10 
40.01250 51.10773 64.53681 60.41523 57.55867 

[[8]]
3        6 
94.86833 75.66373 

[[9]]
1         2         6         7        10 
30.870698  9.486833 97.082439 60.415230 67.119297 

[[10]]
1        2        4        5        6        7        9 
73.75636 60.53924 99.32774 82.49242 34.92850 57.55867 67.11930 
于 2013-05-12T00:09:05.263 回答