0

for my coursework(binary search tree and hashtables) I would like to make a java program that scans a text file and orders words based on the most frequent words. Something like most popular tags.

Example: 1. Scan the file. 2. List words that appears more than once

WORD TOTAL
Banana 10
Sun 7
Sea 3

Question 1. is how do I scan a text file?
Question 2. how do I check for duplicates in the text file and number it?
Question 3. how do I print out the words that appears more than 1 time out in the order like my example?

My programming is not strong.

4

2 回答 2

1

Since it is course work, I'm not gonna provide you full details, but I'll try to point you in a possible direction:

  1. Google how to read words from a text file (this is a very common problem, you should be able to find tons of examples)
  2. Use for instance hashmap (string to int) to count the words: if a word is not in the hashmap yet, add it with multiplicity 1; if it is in there, increment the count (you might want to do some preprocessing on the words, for instance if you want to ignore capitals)
  3. Filter the words with multiplicity more than 1 from your hashmap
  4. Sort the filtered list of words based on their count

Some very high-level implementation (with many open ends :) )

List<String> words = readWordsFromFile();

Map<String, Integer> wordCounts = new HashMap<>();
for(String word : words) {
    String processedWord = preprocess(word);
    int count = 1;
    if (wordCounts.containsKey(processedWord)) {
        count = wordCounts.get(processedWord)+1;
    }
    wordCounts.put(processedWord, count);
}

removeSingleOccurences(wordCounts); 
List<String> sortedWords = sortWords(wordCounts);
于 2013-06-07T11:50:10.073 回答
0

You can use Multiset from Guava Lib: http://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained#Multiset

于 2013-06-07T11:45:36.963 回答