原始答案:请参阅下面的更新。
首先,通过将第一行添加到底部,我使您的示例数据更具挑战性。
dff <- structure(list(code = c("61000-61003", "0169T-0169T", "61000-61003"
), label = c("excision of CNS", "ventricular shunt", "excision of CNS"
)), .Names = c("code", "label"), row.names = c(NA, 3L), class = "data.frame")
dff
# code label
# 1 61000-61003 excision of CNS
# 2 0169T-0169T ventricular shunt
# 3 61000-61003 excision of CNS
我们可以使用序列操作符:
来获取code
列的序列,用 包裹,tryCatch()
这样我们就可以避免错误,并保存无法排序的值。首先,我们用破折号分割值,-
然后通过lapply()
.
xx <- lapply(
strsplit(dff$code, "-", fixed = TRUE),
function(x) tryCatch(x[1]:x[2], warning = function(w) x)
)
data.frame(code = unlist(xx), label = rep(dff$label, lengths(xx)))
# code label
# 1 61000 excision of CNS
# 2 61001 excision of CNS
# 3 61002 excision of CNS
# 4 61003 excision of CNS
# 5 0169T ventricular shunt
# 6 0169T ventricular shunt
# 7 61000 excision of CNS
# 8 61001 excision of CNS
# 9 61002 excision of CNS
# 10 61003 excision of CNS
我们正在尝试将序列运算符:
应用于 中的每个元素strsplit()
,如果x[1]:x[2]
无法获取,则仅返回这些元素的值,x[1]:x[2]
否则继续执行序列。label
然后我们只是根据结果长度复制列的值xx
以获得新label
列。
更新:这是我为回应您的编辑而提出的。将xx
上面替换为
xx <- lapply(strsplit(dff$code, "-", TRUE), function(x) {
s <- stringi::stri_locate_first_regex(x, "[A-Z]")
nc <- nchar(x)[1L]
fmt <- function(n) paste0("%0", n, "d")
if(!all(is.na(s))) {
ss <- s[1,1]
fmt <- fmt(nc-1)
if(ss == 1L) {
xx <- substr(x, 2, nc)
paste0(substr(x, 1, 1), sprintf(fmt, xx[1]:xx[2]))
} else {
xx <- substr(x, 1, ss-1)
paste0(sprintf(fmt, xx[1]:xx[2]), substr(x, nc, nc))
}
} else {
sprintf(fmt(nc), x[1]:x[2])
}
})
是的,这很复杂。现在,如果我们将以下数据框df2
作为测试用例
df2 <- structure(list(code = c("61000-61003", "0169T-0174T", "61000-61003",
"T0169-T0174"), label = c("excision of CNS", "ventricular shunt",
"excision of CNS", "ventricular shunt")), .Names = c("code",
"label"), row.names = c(NA, 4L), class = "data.frame")
并在上面运行xx
代码,我们可以得到以下结果。
data.frame(code = unlist(xx), label = rep(df2$label, lengths(xx)))
# code label
# 1 61000 excision of CNS
# 2 61001 excision of CNS
# 3 61002 excision of CNS
# 4 61003 excision of CNS
# 5 0169T ventricular shunt
# 6 0170T ventricular shunt
# 7 0171T ventricular shunt
# 8 0172T ventricular shunt
# 9 0173T ventricular shunt
# 10 0174T ventricular shunt
# 11 61000 excision of CNS
# 12 61001 excision of CNS
# 13 61002 excision of CNS
# 14 61003 excision of CNS
# 15 T0169 ventricular shunt
# 16 T0170 ventricular shunt
# 17 T0171 ventricular shunt
# 18 T0172 ventricular shunt
# 19 T0173 ventricular shunt
# 20 T0174 ventricular shunt