1

我正在尝试使用 agrep 命令进行模糊匹配。我有一个数据框,其中一列包含受众响应,另一个数据框列出了细分和子细分。列受众响应包含作为子分段名称的单词。例如:

pattern$audience
[1] "(Deleted) Semasio » DE: Intent » Christmas Shopping"          
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"      
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"        
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"
[5] "(Old) AddThis - UK » Food » Social"                           
[6] "(Old) AddThis - UK » Health » Social » Health Influencers" 

同样,我有另一个名为 x 的数据框包含段和子段

x$segment               x$subsegment
Shopping                Financial shoppers
Travel                  Travel Europe
Shopping                Christmas shopping

我想编写一个函数,在 pattern$Audience 和 x$subsegment 之间进行模糊匹配,并在新列中返回每个观众响应的 subsegment 作为 pattern$subseg

我需要的结果数据集应该是这样的:

pattern$audience    x$segment               x$subsegment                
[1] "(Deleted) Semasio » DE: Intent » Christmas C"            Shopping                Christmas shopping              
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"                         
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"                           
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"   Shopping                Financial shoppers              
[5] "(Old) AddThis - UK » Food » Social"                                              
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"                  

这是我尝试编写的代码,但它没有返回我想要的输出:

x <- rename(x, c("Segment" = "segment", "Sub Segment" = "subseg"))
names(x)
y <- as.data.frame(x$subseg)
y <- rename(y, c("x$subseg" = "subseg"))


n.match <- function(pattern, x, ...) {
  for (i in 1:nrow(pattern)) {
        x <- (agrep(y,pattern$audience[i],
                 ignore.case=TRUE, value = TRUE))
              x <- paste0(x,"")
              pattern$subseg[i] <- x
  }
  head(pattern)
    }

有人可以帮我纠正我的错误。我真的很感激你的回答。非常感谢

4

1 回答 1

0

我们可以试试这个:

pattern <- c("(Deleted) Semasio » DE: Intent » Christmas C",          
         "(Old) AddThis - UK » Auto » General » Auto Enthusiasts",
         "(Old) AddThis - UK » Auto » General » Auto Intenders",        
         "(Old) AddThis - UK » Financial » Social » Financial Shoppers",
         "(Old) AddThis - UK » Food » Social",
         "(Old) AddThis - UK » Financial » Social » Financial Shoppers",
         "(Old) AddThis - UK » Health » Social » Health Influencers")
pattern <- data.frame(audiance=pattern)
x <- read.csv(text='segment,   subsegment    
                       Shopping,   Financial shoppers
                       Travel,     Travel Europe
                       Enthusiasts, Auto Enthusiasts  
                       Shopping,   Christmas shopping', stringsAsFactors=FALSE)

vagrep <- Vectorize(agrep, 'pattern', SIMPLIFY = TRUE)
pattern$subsegment <- ''
matches <- vagrep(x$subsegment, pattern$audiance)
invisible(lapply(1:length(matches), function(i) if (length(matches[[i]] > 0)) pattern$subsegment[matches[[i]]] <<- x$subsegment[i]))

pattern
#                                                         audiance            subsegment
#1                  (Deleted) Semasio » DE: Intent » Christmas C                      
#2       (Old) AddThis - UK » Auto » General » Auto Enthusiasts    Auto Enthusiasts  
#3         (Old) AddThis - UK » Auto » General » Auto Intenders                      
#4 (Old) AddThis - UK » Financial » Social » Financial Shoppers    Financial shoppers
#5                            (Old) AddThis - UK » Food » Social                      
#6 (Old) AddThis - UK » Financial » Social » Financial Shoppers    Financial shoppers
#7    (Old) AddThis - UK » Health » Social » Health Influencers                      
于 2017-03-09T06:51:18.360 回答