r - 从 2 个变量中解析正则表达式 grep

Question

目的是折叠/重新分配级别作为清理数据集的一部分。

这是示例：

df <- data.frame(V1 <- c("cat","lion","cat","beast","cat"),
             V2 <- c("nice and grumpy","angry","old,but also nice","empty","has friends"),
             stringsAsFactors = F); colnames(df) <- c("V1","V2")
>df
     V1          V2
1   cat   nice and grumpy
2  lion             angry
3   cat   old,but also nice
4 beast             empty
5   cat       has friends

兴趣水平是cat; 这些是条目：

parse1 <- V1[grepl("cat",V1)]
#[1] "cat" "cat" "cat"

从那里开始，我们的想法是在中搜索一个属性V2，nice在该属性上，该级别cat将被重命名为nice cat。此搜索找到 2 个感兴趣的条目V2：

 df.sub <- subset(df,V1=="cat",select=V1:V2)
 parse2 <- df.sub$V2[grep("([Nn]ice)",df.sub$V2)]
#[1] "nice and grumpy"   "old,but also nice"

理想的最终结果将df转变为：

     V1                V2
1   nice cat   nice and grumpy
2   lion           king
3   nice cat   old,but also nice
4   beast           empty
5   cat        has friends

任何想法如何实现这一目标？非常感谢。

score 1 · Accepted Answer

你可以使用data.table

df <- data.frame(V1 <- c("cat","lion","cat","beast","cat"),
         V2 <- c("nice and grumpy","angry","old,but also nice","empty","has friends"),
         stringsAsFactors = F); colnames(df) <- c("V1","V2")

library(data.table)
DT <- data.table(df)
# All the nice animals
DT[grepl ("([Nn]ice)",V2), V3:= paste0("nice ",V1)]
# All the nice cats
DT[grepl ("([Nn]ice)",V2) & V1=="cat", V4:= paste0("nice ",V1)]

score 1 · Accepted Answer

一个ifelse似乎就足够了：

df$V1 <- ifelse(grepl("([Nn]ice)", df$V2), 
                sub('cat', 'nice cat', df$V1), 
                df$V1 )

输出：

> df
        V1                V2
1 nice cat   nice and grumpy
2     lion             angry
3 nice cat old,but also nice
4    beast             empty
5      cat       has friends

r - 从 2 个变量中解析正则表达式 grep

2 回答 2

Related

Reference