我有一个 R 函数来生成K-Skip-N-Grams:
我的完整函数可以在github找到。
我的代码确实正确生成了所需的 k-skip-ngram:
> kSkipNgram("Lorem ipsum dolor sit amet, consectetur adipiscing elit.", n=2, skip=1)
[1] "Lorem dolor" "Lorem ipsum" "ipsum sit"
[4] "ipsum dolor" "dolor amet" "dolor sit"
[7] "sit consectetur" "sit amet" "amet adipiscing"
[10] "amet consectetur" "consectetur elit" "consectetur adipiscing"
[13] "adipiscing elit"
但我想概括/简化嵌套 for 循环的以下 switch 语句:
# x - should be text, sentense
# n - n-gramm
# skip - number of skips
###################################
switch(as.character(n),
"0" = {ngram<-c(ngram, paste(x[i]))},
"1" = {for(j in skip:1)
{
if (i+j <= length(x))
{ngram<-c(ngram, paste(x[i],x[i+j]))}
}
},
"2" = {for(j in skip:1)
{for (k in skip:1)
{
if (i+j <= length(x) && i+j+k <= length(x))
{ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k]))}
}
}
},
"3" = {for(j in skip:1)
{for (k in skip:1)
{for (l in skip:1)
{
if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x))
{ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l]))}
}
}
}
},
"4" = {for(j in skip:1)
{for (k in skip:1)
{for (l in skip:1)
{for (m in skip:1)
{
if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x) && i+j+k+l+m <= length(x))
{ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l],x[i+j+k+l+m]))}
}
}
}
}
}
)
}
}