考虑以下逗号分隔的数字字符串:
s <- "1,2,3,4,8,9,14,15,16,19"
s
# [1] "1,2,3,4,8,9,14,15,16,19"
是否可以将连续数字的运行折叠到其相应的范围,例如,1,2,3,4
上面的运行将折叠到 range 1-4
。所需的结果类似于以下字符串:
s
# [1] "1-4,8,9,14-16,19"
考虑以下逗号分隔的数字字符串:
s <- "1,2,3,4,8,9,14,15,16,19"
s
# [1] "1,2,3,4,8,9,14,15,16,19"
是否可以将连续数字的运行折叠到其相应的范围,例如,1,2,3,4
上面的运行将折叠到 range 1-4
。所需的结果类似于以下字符串:
s
# [1] "1-4,8,9,14-16,19"
我从这个问题的答案中得到了一些重要的启发。
findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))
paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"
Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000
@speendo 的解决方案是目前最快的,但这些都还没有优化。
我太慢了......但这是另一个解决方案。
它使用较少的 R 特定函数,因此可以移植到其他语言(另一方面,它可能不太优雅)
s <- "1,2,3,4,8,9,14,15,16,19"
collapseConsecutive <- function(s){
x <- as.numeric(unlist(strsplit(s, ",")))
x_0 <- x[1]
out <- toString(x[1])
hasDash <- FALSE
for(i in 2:length(x)) {
x_1 <- x[i]
x_2 <- x[i+1]
if((x_0 + 1) == x_1 && !is.na(x_2) && (x_1 + 1) == x_2) {
if(!hasDash) {
out <- c(out, "-")
hasDash <- TRUE
}
} else {
if(hasDash) {
hasDash <- FALSE
} else {
out <- c(out, ",")
}
out <- c(out, x_1)
hasDash <- FALSE
}
x_0 <- x_1
}
outString <- paste(out, collapse="")
outString
}
collapseConsecutive(s)
# [1] "1-4,8,9,14-16,19"
另一个相当紧凑的选择
in.seq <- function(x) {
# returns TRUE for elments within ascending sequences
(c(diff(x, 1), NA) == 1 & c(NA, diff(x,2), NA) == 2)
}
contractSeqs <- function(x) {
# returns string formatted with contracted sequences
x[in.seq(x)] <- ""
gsub(",{2,}", "-", paste(x, collapse=","), perl=TRUE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s1 <- as.numeric(unlist(strsplit(s, ","))) # as earlier answers
# assumes: numeric vector, length > 2, positive integers, ascending sequences
contractSeqs(s1)
# [1] "1-4,8,9,14-16,19"
我还写了一个花里胡哨的版本,它可以处理数字和字符串输入,包括命名对象、降序序列和替代标点符号,以及执行错误检查和报告。如果有人感兴趣,我可以将其添加到我的答案中。
这是一个应该做你想做的功能:
conseq <- function(s){
s <- as.numeric(unlist(strsplit(s, ",")))
dif <- s[seq(length(s))][-1] - s[seq(length(s)-1)]
new <- !c(0, dif == 1)
cs <- cumsum(new)
res <- vector(mode="list", max(cs))
for(i in seq(res)){
s.i <- s[which(cs == i)]
if(length(s.i) > 2){
res[[i]] <- paste(min(s.i), max(s.i), sep="-")
} else {
res[[i]] <- as.character(s.i)
}
}
paste(unlist(res), collapse=",")
}
> s <- "1,2,3,4,8,9,14,15,16,19"
> conseq(s)
[1] "1-4,8,9,14-16,19"