0

我正在尝试编写一个函数,该函数在使文本进行搬运工词干处理时返回单词的词干图。当我尝试运行一个示例时,代码不会停止运行,即没有输出。没有错误,但是当我强制停止它时,它给出了如下警告:

1: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
2: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
3: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
4: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
5: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
6: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
7: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
8: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
9: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length

我的代码如下:

stemMAP<-function(text){
  flatText<-unlist(strsplit(text," "))
  textLength<-length(flatText)

  stemList<-list(NULL)
  for(i in 1:textLength){
    wordStem<-SnowballStemmer(flatText[i])
    flagStem=0
    flagWord=0

    for(j in 1:length(stemList)){
      if(regexpr(wordStem,stemList[j][1])==TRUE){

        for(k in 1:length(stemList[j])){
          if(regexpr(flatText[i],stemList[j][k])==TRUE){ 
            flagWord=1
            #break;
            }
         }

        if(flagWord==0){
          stemList[j][length(stemList[j])+1]<-flatText[i]
          #break;
        }

        flagStem=1

      }

      if(flagStem==0){
        stemList[length(stemList)+1][1]<-wordStem
        stemList[length(stemList)+1][2]<-flatText[i]
      }

    }

  }

  return(stemList)
}

如何识别错误?我的测试语句是:

stem<-stemMAP("I like being active and playing because when you play it activates your body and this activation leads to a good health")
4

1 回答 1

5

在这里,我使用SnowballStemmer. 无需用于。

library(plyr)   
stemMAP<-function(text){
  flatText <- unlist(strsplit(text," "))
  ## here I use the vectorize version
  wordStem <- as.character(SnowballStemmer(flatText))
  hh <- data.frame(ff = flatText,sn = wordStem)
  ## I use plyr to transform the result to a list
  ## dlply : data.frame to list apply
  ## we group the hh by the column sn , and a apply the 
  ## function as.character(x$ff) to each group( x here is subset data.fame)
  stemList <- dlply(hh,.(sn),function(x) as.character(x$ff))
  stemList
}

stemList
$I
[1] "I"

$a
[1] "a"

$activ
[1] "active"     "activates"  "activation"

$and
[1] "and" "and"

$be
[1] "being"
于 2012-12-27T11:32:15.677 回答