1

如果我有这些字符串:

dat <- data.frame(xxs = c("PElookx.PElookxstd","POaftGx.POlookGxstd"))

我如何创建一个新变量,例如,如果字符串包含PE我想要的NOW或者PO我会得到LATER

newxxs <- (`NOW`,`LATER`)

我有点知道如何使用 grep 来做到这一点:

dat$newxss <- NA
dat$newxss[grep("PE",dat$xxs)] <- "NOW"
dat$newxss[grep("PO",dat$xxs)] <- "LATER"

有没有比很多greps 更简单的方法?因为我将不得不为同一新列和许多新列的多个字符串位执行此操作。

4

2 回答 2

3

如果您有不同的替换要做,您可以创建一个自定义函数来一次完成所有替换,例如:

subst <- function(var, corresp) {
  sapply(corresp, function(elem) {
    var[grep(elem[1],var)] <- elem[2]
  })
}

var <- c("PEfoo", "PObar", "PAfoofoo", "PUbarbar")
corresp <- list(c("PE","NOW"),
                c("PO","LATER"),
                c("PA", "MAYBE"),
                c("PU", "THE IPHONE IS IN THE BLENDER"))
subst(var, corresp)

会给 :

[1] "NOW"                          "LATER"                       
[3] "MAYBE"                        "THE IPHONE IS IN THE BLENDER"

因此,您可以重复将函数应用于数据框的不同列:

dat$new1 <- subst(dat$old1, corresp1)
dat$new2 <- subst(dat$old2, corresp2)
dat$new3 <- subst(dat$old3, corresp3)
...
于 2013-02-12T13:45:46.687 回答
2

如果你所有的字符串肯定有一个PEPO在其中,你可以使用ifelse

ifelse(grepl("PE", dat$xxs), "NOW", "LATER")

例子:

set.seed(45)

x <- sample(c("PEx", "POy"), 20, replace=T)
# [1] "POy" "PEx" "PEx" "PEx" "PEx" "PEx" "PEx" "POy" "PEx" "PEx" 
#         "PEx" "POy" "PEx" "PEx" "PEx" "PEx" "POy" "PEx" "PEx" "PEx"

ifelse(grepl("PE", x), "NOW", "LATER")

# [1] "LATER" "NOW"   "NOW"   "NOW"   "NOW"   "NOW"   "NOW"   "LATER" "NOW"   
#         "NOW"   "NOW"   "LATER" "NOW"   "NOW"   "NOW"  
# [16] "NOW"   "LATER" "NOW"   "NOW"   "NOW"  
于 2013-02-12T13:35:15.953 回答