15

我基本上是在寻找一种在 R中对这个 Ruby 脚本
进行变体的方法。 我有一个任意的数字列表(在这种情况下,用于回归图的主持人的步骤),它们之间的距离不相等,而且我我想将这些数字周围范围内的值四舍五入到列表中最接近的数字。范围不重叠。

arbitrary.numbers <- c(4,10,15) / 10
numbers <- c(16:1 / 10, 0.39, 1.45)
range <- 0.1

预期输出:

numbers
## 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.39 1.45
round_to_nearest_neighbour_in_range(numbers,arbitrary.numbers,range)
## 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5

我有一个小辅助函数可以解决我的特定问题,但它不是很灵活,它包含一个循环。我可以在这里发布,但我认为真正的解决方案看起来会完全不同。

为速度计时的不同答案(在一百万个数字上)

> numbers = rep(numbers,length.out = 1000000)
> system.time({ mvg.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  0.067 
> system.time({ rinker.loop.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  0.289 
> system.time({ rinker.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  1.403 
> system.time({ nograpes.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  1.971 
> system.time({ january.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  16.12 
> system.time({ shariff.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
15.833 
> system.time({ mplourde.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
  9.613 
> system.time({ kohske.round(numbers,arbitrary.numbers,range) })[3]
elapsed 
 26.274 

MvG的函数是最快的,比Tyler Rinker的第二个函数快5倍左右。

4

6 回答 6

9

一个向量化的解决方案,没有任何apply族函数或循环:

关键是findInterval,它找到arbitrary.numbers每个元素在numbers“之间”的“空间”。所以,findInterval(6,c(2,4,7,8))返回2,因为6在 的第二个和第三个索引之间c(2,4,7,8)

# arbitrary.numbers is assumed to be sorted.
# find the index of the number just below each number, and just above.
# So for 6 in c(2,4,7,8) we would find 2 and 3.
low<-findInterval(numbers,arbitrary.numbers) # find index of number just below
high<-low+1 # find the corresponding index just above.

# Find the actual absolute difference between the arbitrary number above and below.
# So for 6 in c(2,4,7,8) we would find 2 and 1. 
# (The absolute differences to 4 and 7).
low.diff<-numbers-arbitrary.numbers[ifelse(low==0,NA,low)]
high.diff<-arbitrary.numbers[ifelse(high==0,NA,high)]-numbers

# Find the minimum difference. 
# In the example we would find that 6 is closest to 7, 
# because the difference is 1.
mins<-pmin(low.diff,high.diff,na.rm=T) 
# For each number, pick the arbitrary number with the minimum difference.
# So for 6 pick out 7.
pick<-ifelse(!is.na(low.diff) & mins==low.diff,low,high)

# Compare the actual minimum difference to the range. 
ifelse(mins<=range+.Machine$double.eps,arbitrary.numbers[pick],numbers)
# [1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
于 2012-10-12T14:58:46.163 回答
5

另一个解决方案使用findInterval

arbitrary.numbers<-sort(arbitrary.numbers)          # need them sorted
range <- range*1.000001                             # avoid rounding issues
nearest <- findInterval(numbers, arbitrary.numbers - range) # index of nearest
nearest <- c(-Inf, arbitrary.numbers)[nearest + 1]  # value of nearest
diff <- numbers - nearest                           # compute errors
snap <- diff <= range                               # only snap near numbers
numbers[snap] <- nearest[snap]                      # snap values to nearest
print(numbers)

上面代码中的nearest在数学上并不是最接近的数字。相反,它是最大的任意数nearest[i] - range <= numbers[i],或等价于nearest[i] <= numbers[i] + range。因此,我们一口气找到了最大的任意数字,它要么在给定输入数字的捕捉范围内,要么仍然太小。出于这个原因,我们只需要检查一种方式snap。不需要绝对值,甚至不需要对这篇文章的先前版本进行平方。

感谢在数据帧上对指针进行间隔搜索,因为我在nograpesfindInterval的答案中识别它之前在那里找到了它。

如果与您的原始问题相反,您有重叠的范围,您可以这样写:

arbitrary.numbers<-sort(arbitrary.numbers)        # need them sorted
range <- range*1.000001                           # avoid rounding issues
nearest <- findInterval(numbers, arbitrary.numbers) + 1 # index of interval
hi <- c(arbitrary.numbers, Inf)[nearest]          # next larger
nearest <- c(-Inf, arbitrary.numbers)[nearest]    # next smaller
takehi <- (hi - numbers) < (numbers - nearest)    # larger better than smaller
nearest[takehi] <- hi[takehi]                     # now nearest is really nearest
snap <- abs(nearest - numbers) <= range           # only snap near numbers
numbers[snap] <- nearest[snap]                    # snap values to nearest
print(numbers)

In this code, nearestreally ends up being the nearest number. This is achieved by considering both endpoints of every interval. In spirit, this is very much like the version by nograpes, but it avoids using ifelse and NA, which should benefit performance as it reduces the number of branching instructions.

于 2012-10-12T16:44:49.293 回答
3

这是你想要的吗?

> idx <- abs(outer(arbitrary.numbers, numbers, `-`)) <= (range+.Machine$double.eps)
> rounded <- arbitrary.numbers[apply(rbind(idx, colSums(idx) == 0), 2, which)]
> ifelse(is.na(rounded), numbers, rounded)
 [1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
于 2012-10-12T14:42:49.427 回答
2

请注意,由于舍入错误(很可能),我使用 range = 0.1000001 来达到预期的效果。

range <- range + 0.0000001

blah <- rbind( numbers, sapply( numbers, function( x ) abs( x - arbitrary.numbers ) ) )
ff <- function( y ) { if( min( y[-1] ) <= range + 0.000001 ) arbitrary.numbers[ which.min( y[ -1 ] ) ] else  y[1]  }
apply( blah, 2, ff )
于 2012-10-12T14:43:12.150 回答
2

这仍然更短:

sapply(numbers, function(x) ifelse(min(abs(arbitrary.numbers - x)) > 
range + .Machine$double.eps, x, arbitrary.numbers[which.min
(abs(arbitrary.numbers - x))] ))

谢谢@MvG

于 2012-10-12T14:53:40.453 回答
1

另外一个选项:

arb.round <- function(numbers, arbitrary.numbers, range) {
    arrnd <- function(x, ns, r){ 
        ifelse(abs(x - ns) <= range +.00000001, ns, x)
    }
    lapply(1:length(arbitrary.numbers), function(i){
            numbers <<- arrnd(numbers, arbitrary.numbers[i], range)
        }
    )
    numbers
}

arb.round(numbers, arbitrary.numbers, range)

产量:

> arb.round(numbers, arbitrary.numbers, range)
[1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5

编辑:我删除了函数末尾的返回调用,因为它没有必要并且可以消耗时间。

编辑:我认为这里的循环会更快:

loop.round <- function(numbers, arbitrary.numbers, range) {
    arrnd <- function(x, ns, r){ 
        ifelse(abs(x - ns) <= range +.00000001, ns, x)
    }
    for(i in seq_along(arbitrary.numbers)){
            numbers <- arrnd(numbers, arbitrary.numbers[i], range)
    }
    numbers
}
于 2012-10-12T14:51:03.027 回答