0

我有一系列需要按频率排序的单词。在我这样做之前,我需要删除诸如“the”、“it”等词(实际上是任何少于三个字母的词),以及所有数字和任何以 # 开头的词(词数组是从Twitter,虽然下面的例子只是维基百科的一个随机段落)。

我可以删除一个词,但一直在疯狂尝试删除多个词或一个范围。有什么建议么?谢谢!

http://jsfiddle.net/9NzAC/6/

HTML:

<div id="text" style="background-color:Teal;position:absolute;left:100px;top:10px;height:500px;width:500px;">
Phrenology is a pseudoscience primarily focused on measurements of the human skull, based on the concept that the brain is the organ of the mind, and that certain brain areas have localized, specific functions or modules. The distinguishing feature of phrenology is the idea that the sizes of brain areas were meaningful and could be inferred by examining the skull of an individual.
</div>

JS:

//this is the function to remove words
<script type="text/javascript">
    function removeA(arr){
        var what, a= arguments, L= a.length, ax;
        while(L> 1 && arr.length){
            what= a[--L];
            while((ax= arr.indexOf(what))!= -1){
                arr.splice(ax, 1);
            }
        }
            return arr;
        }
</script>

//and this does the sorting & counting
<script type="text/javascript">
    var getMostFrequentWords = function(words) {
        var freq={}, freqArr=[], i;

        // Map each word to its frequency in "freq".
            for (i=0; i<words.length; i++) {
            freq[words[i]] = (freq[words[i]]||0) + 1;
        }

        // Sort from most to least frequent.
            for (i in freq) freqArr.push([i, freq[i]]);
            return freqArr.sort(function(a,b) { return b[1] - a[1]; });
        };

        var words = $('#text').get(0).innerText.split(/\s+/);

        //Remove articles & words we don't care about.
        var badWords = "the";
            removeA(words,badWords);
        var mostUsed = getMostFrequentWords(words);
        alert(words);

</script>
4

2 回答 2

2

它不是从原始数组中删除,而是删除push一个新数组,它更简单,它会使您的代码更短且更具可读性。

var words = ['the', 'it', '12', '#twit', 'aloha', 'hello', 'bye']
var filteredWords = []

for (var i = 0, l = words.length, w; i < l; i++) {
    w = words[i]
    if (!/^(#|\d+)/.test(w) && w.length > 3)
        filteredWords.push(w)
}

console.log(filteredWords) // ['aloha', 'hello']

演示:http: //jsfiddle.net/VcfvU/

于 2012-08-01T03:39:20.340 回答
1

我建议你做array[i] = null(或"")然后清理你的数组空节点。你可以很容易地使用Array#filter

测试: http: //jsfiddle.net/6LPep/ 代码:

var FORGETABLE_WORDS = ',the,of,an,and,that,which,is,was,';

var words = text.innerText.split(" ");

for(var i = 0, word; word = words[i++]; ) {
    if (FORGETABLE_WORDS.indexOf(',' + word + ',') > -1 || word.length < 3) {
      words[i-1] = "";
    }
}

// falsy will get deleted
words.filter(function(e){return e});
// as example
output.innerHTML = words.join(" ");

// just continue doing your stuff with "words" array.
// ...​

我认为它比你目前的做法更干净。如果您需要其他任何内容,我将更新此答案。

于 2012-08-01T03:32:22.223 回答