statistics - How to compute word rank in perl

Question

I have a program where I read in a document and then put all the words in a hash delete repeats and add to frequency of that word.

So for example:

KEY: VALUE: dog 2 cat 4 rat 1

Now I was told I need to compute the rank of each word and print those stats. What does this mean exactly? What type of math do I need to be looking at? If someone could point me to a document talking about word rank that could help.

Thanks

score 1 · Accepted Answer

If you delete repeats, you won't have a "frequency" or at least they'll all be at most 1, so don't do that. If you're talking about merging the count of repeats (I think you are) then I'd have to assume the rank you're referring to is the number of occurences for each word in the file.

If you're merging properly you'll have an array with key value pairs, sort on the value descending to rank.

BTW - this sounds like a homework question, if so - look for a quick sort to sort the array on the value. That's all the more I'll say. HTH.

score 1 · Accepted Answer

Ranking is simply ordering so that the most frequent word has rank 1. Take a look at Zipf's law for how we expect words to behave with respect to their frequency rank in a suitably large corpus.

statistics - How to compute word rank in perl

2 回答 2

Related

Reference