r - 将 stringr 函数传递到 data.table 时无法正确解析

Question

这里有点奇怪。我有一个文件根列表，我想从每个根中提取终端文件名。stringr 函数的丑陋组合通过检测字符串中的最后一个“/”字符然后从后面提取来完成这项工作。

现在奇怪的是，该函数在单独应用于任何一个字符串时工作正常，但在向下传递 data.table 时似乎不能正确应用：

require(data.table)
require(stringr)

file_list <- data.table(file_root = c("~/dat/stuff/thing.csv",
                                      "~/dat/stuff/thingy.csv",
                                      "~/dat/otherstuff/thinger.csv"))

file_root <- "~/dat/otherstuff/thinger.csv"

success <- str_sub(file_root,-(str_length(file_root) - max(str_locate_all(file_root,"/")[[1]])),-1) 

#> success
#[1] "thinger.csv"

file_list[, extract := str_sub(file_root,-(str_length(file_root) - max(str_locate_all(file_root,"/")[[1]])),-1)]

#> head(file_list)
#file_root          extract
#1:        ~/dat/stuff/thing.csv        thing.csv
#2:       ~/dat/stuff/thingy.csv       thingy.csv
#3: ~/dat/otherstuff/thinger.csv tuff/thinger.csv Final result is incorrect

我可以将 strsplit 函数放在一起使用 sapply 向下数据表完成这项工作，但实际上 file_list 将有数十万行长，并且 sapply 将花费非常长的时间。

find_name <- function(X) {as.character(data.table(strsplit(X,"/")[[1]])[NROW(data.table(strsplit(X,"/")[[1]]))])}

file_list[,extract := sapply(file_root,find_name)]

所以我的问题是。知道为什么原始功能不起作用，以及如何修复它吗？或者，我怎样才能让 find_name 函数更快地工作？

提前致谢....

score 0 · Accepted Answer

Arun 的 basename 建议效果很好，例如

file_list[,file_name := basename(file_root)]

找到奇怪的 stringr 结果的原因仍然很有趣，但这个解决方案适用于我的直接问题。

干杯

r - 将 stringr 函数传递到 data.table 时无法正确解析

1 回答 1

Related

Reference