0

我正在进行数据预处理,但遇到了一个问题。我有Telma 2525 mg tablet 之类的数据。我希望将其转换为 Telma 25 mg 片剂。可以这样做吗?

谢谢

4

2 回答 2

1

gusb()

> x<-rep("Telma 2525 mg tablet",10)
> x
[1] "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet"
[6] "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet"

> gsub("Telma 2525 mg tablet","Telma 25 mg tablet",x)

[1] "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet"
[6] "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet"

x你的数据源在哪里

编辑 - 更新使其通用

d<-data.frame(t=c("blah blah 2525 mg", "blah blah 7272 mg"),stringsAsFactors=F)

remdup<-function(s){
f<-regexec("[0-9]{4}",s)[[1]][1] # find the start point for 4 digits in a row 
sub(substr(s,f,f+1),"",s)        # remove the first match of the first 2 digits
}

lapply(d$t,FUN=function(x)remdup(x))

#[[1]]
#[1] "blah blah 25 mg"
#  
#[[2]]
#[1] "blah blah 72 mg"
于 2014-02-04T14:57:31.537 回答
0

解决方案 1:将自定义字符串替换为 Valid values 为 Standardization custom string-Telma 2525 mg Valid value-Telma 25 mg

解决方案2:通过参考表。

于 2015-02-10T20:00:39.803 回答