1

我有一个数据集,其中有一列像

   string<-c('lib1_Rstudio_case1','lib2_Rstudio_case1and2','lib5_python_notthe correct_language','lib3_Jupyter_really_good','lib1_spyder_nice','lib1_R_the_core')
   replacement<-c('Rstudio','Jupyter','spyder','R')

我想替换与替换值匹配的字符串值 id。我现在正在使用以下代码

gsub(paste(replacement, collapse = "|"), replacement = replacement, x = string)

这在我用来查找案例的另一段代码中

string[grepl(paste(replacement, collapse='|'), string, ignore.case=TRUE)]

我想更新那些我发现我希望输出像

Rstudio,Rstudio,'',Jupyter,spyder,R

我不想通过硬编码来做到这一点。我想编写一个可扩展的代码。

非常感谢任何帮助

提前致谢

4

2 回答 2

1

这是我使用的另一个简单代码。那不需要正则表达式功能。感谢您的帮助

string<-c('lib1_Rstudio_case1','lib2_Rstudio_case1and2','lib5_python_notthe correct_language','lib3_Jupyter_really_good','lib1_spyder_nice','lib1_R_the_core')
replacement<-c('R','Jupyter','spyder','Rstudio')
replaced=string
replaced=''


for (i in 1:length(replacement))
{
  replaced[which(grepl(replacement[i],string))]=replacement[i]
}
replaced[is.na(replaced)]=''
于 2017-03-10T18:10:54.057 回答
1

id用函数隔离gsub,然后用函数找到与id长度不匹配的。然后用空字符替换标识的 id 。replacementis.na''

编辑:由于您更改了问题中的字符串数据,因此我修改了gsub函数。函数中使用的模式gsub将在文本之后查找数值lib并省略字符串元素的其余部分。

replacement<-c('Rstudio','Jupyter','spyder','R')

string<-c('lib1_Rstudio','lib2_Rstudio','lib5_python','lib3_Jupyter','lib1_spyder','lib1_R')
index <- is.na( replacement[ as.integer( gsub( "lib([[:digit:]])*[[:alnum:]_\ ]*", "\\1", string)) ] )
a1 <- sapply( strsplit(string, "_"), function( x ) x[2] )
a1[ index ] <- ''
a1
# [1] "Rstudio" "Rstudio" ""        "Jupyter" "spyder"  "R"    

string <- c('lib1_Rstudio_case1','lib2_Rstudio_case1and2','lib5_python_notthe correct_language','lib3_Jupyter_really_good','lib1_spyder_nice','lib1_R_the_core')
index <- is.na( replacement[ as.integer( gsub( "lib([[:digit:]])*[[:alnum:]_\ ]*", "\\1", string)) ] )
a1 <- sapply( strsplit(string, "_"), function( x ) x[2] )
a1[ index ] <- ''
a1
# [1] "Rstudio" "Rstudio" ""        "Jupyter" "spyder"  "R"
于 2017-03-09T00:50:31.867 回答