4

我正在寻找一个 gsub 字符串,它将返回表达式的所有匹配项,而不仅仅是最后一个匹配项。IE:

data <- list("a sentence with citation (Ref. 12) and another (Ref. 13)", "single (Ref. 14)")
gsub(".*(Ref. (\\d+)).*", "\\1", data)

退货

[1] "Ref. 13" "Ref. 14"

所以我失去了参考。12.

4

4 回答 4

7

您可以使用包中的strapply函数gsubfn来执行此操作:

library(gsubfn)

data <- list("a sentence with citation (Ref. 12) and another (Ref. 13)", "single (Ref. 14)") 
unlist(strapply(data,"(Ref. (\\d+))"))
于 2012-04-18T20:13:13.003 回答
6

怎么样

sapply(data,stringr::str_extract_all,pattern="Ref. (\\d+))")

?

于 2012-04-18T18:44:12.323 回答
4

这是一个函数——本质上是一个包装器gregexpr()——它将从单个字符串中捕获多个引用。

extractMatches <- function(data, pattern) {
    start <-  gregexpr(pattern, data)[[1]]
    stop  <-  start + attr(start, "match.length") - 1
    if(-1 %in% start) {
        ""    ## **Note** you could return NULL if there are no matches 
    } else {
        mapply(substr, start, stop, MoreArgs = list(x = data))
    }
}    

data <- list("a sentence with citation (Ref. 12), (Ref. 13), and then (Ref. 14)",
             "another sentence without reference")
pat <- "Ref. (\\d+)"

res <- lapply(data, extractMatches, pattern = pat)
res
# [[1]]
# [1] "Ref. 12" "Ref. 13" "Ref. 14"
# 
# [[2]]
# [1] ""

(** 注意 **:如果您返回NULL而不是""在字符串中没有引用时返回,那么您可以对结果进行后处理do.call("c", res)以获取仅包含匹配引用的单个向量)。

于 2012-04-18T18:03:52.047 回答
2

我之前有一个非常相似的问题(http://thebiobucket.blogspot.com/2012/03/how-to-extract-citation-from-body-of.html)并想出了这个(实际上非​​常接近本的) 解决方案:

require(stringr)
unlist(str_extract_all(unlist(data), pattern = "\\(.*?\\)"))

给予:

[1] "(Ref. 12)" "(Ref. 13)" "(Ref. 14)"
于 2012-04-18T19:49:18.867 回答