我有一个数据集,其中充满了不适当间隔的句子。我试图想出一种方法来删除一些空格。
我从一个转换为单词数据框的句子开始:
> word5 <- "hotter the doghou se would be bec ause the co lor was diffe rent"
> abc1 <- data.frame(filler1 = 1,words1=factor(unlist(strsplit(word5, split=" "))))
> abc1
filler1 words1
1 1 hotter
2 1 the
3 1 doghou
4 1 se
5 1 would
6 1 be
7 1 bec
8 1 ause
9 1 the
10 1 co
11 1 lor
12 1 was
13 1 diffe
14 1 rent
接下来,我使用以下代码尝试拼写检查和组合单词,这些单词是它们之前或之后的单词的组合:
abc2 <- abc1
i <- 1
while(i < nrow(abc1)){
print(abc2)
if(nrow(aspell(abc1$words1[i])) == 0){
print(paste(i,"Words OK",sep=" | "));flush.console()
i <- i + 1
}
else{
if(nrow(aspell(abc1$words1[i])) > 0 & i != 1){
preWord1 <- abc1$words1[i-1]
postWord1 <- abc1$words1[i+1]
badWord1 <- abc1$words1[i]
newWord1 <- factor(paste(preWord1,badWord1,sep=""))
newWord2 <- factor(paste(badWord1,postWord1,sep=""))
if(nrow(aspell(newWord1)) == 0 & nrow(aspell(newWord2)) != 0){
abc2[i,"words1"] <-as.character(newWord1)
abc2 <- abc2[-c(i+1),]
print(paste(i,"word1",sep=" | "));flush.console()
i <- i + 1
}
if(nrow(aspell(newWord1)) != 0 & nrow(aspell(newWord2)) == 0){
abc2[i ,"words1"] <-as.character(newWord2)
abc2 <- abc2[-c(i-1),]
print(paste(i,"word2",sep=" | "));flush.console()
i <- i + 1
}
}
}
}
在玩了一段时间后,我得出的结论是我需要某种类型的迭代器,但不确定如何在 R 中实现它。有什么建议吗?