我有一系列需要按频率排序的单词。在我这样做之前,我需要删除诸如“the”、“it”等词(实际上是任何少于三个字母的词),以及所有数字和任何以 # 开头的词(词数组是从Twitter,虽然下面的例子只是维基百科的一个随机段落)。
我可以删除一个词,但一直在疯狂尝试删除多个词或一个范围。有什么建议么?谢谢!
HTML:
<div id="text" style="background-color:Teal;position:absolute;left:100px;top:10px;height:500px;width:500px;">
Phrenology is a pseudoscience primarily focused on measurements of the human skull, based on the concept that the brain is the organ of the mind, and that certain brain areas have localized, specific functions or modules. The distinguishing feature of phrenology is the idea that the sizes of brain areas were meaningful and could be inferred by examining the skull of an individual.
</div>
JS:
//this is the function to remove words
<script type="text/javascript">
function removeA(arr){
var what, a= arguments, L= a.length, ax;
while(L> 1 && arr.length){
what= a[--L];
while((ax= arr.indexOf(what))!= -1){
arr.splice(ax, 1);
}
}
return arr;
}
</script>
//and this does the sorting & counting
<script type="text/javascript">
var getMostFrequentWords = function(words) {
var freq={}, freqArr=[], i;
// Map each word to its frequency in "freq".
for (i=0; i<words.length; i++) {
freq[words[i]] = (freq[words[i]]||0) + 1;
}
// Sort from most to least frequent.
for (i in freq) freqArr.push([i, freq[i]]);
return freqArr.sort(function(a,b) { return b[1] - a[1]; });
};
var words = $('#text').get(0).innerText.split(/\s+/);
//Remove articles & words we don't care about.
var badWords = "the";
removeA(words,badWords);
var mostUsed = getMostFrequentWords(words);
alert(words);
</script>