2

I have a method that takes in a list of words. These words are checked against a hASHmap of words that has a String as a key, and an Integer as a value. The String is a word, and the Integer represents that words frequency in a text file.

Currently the list of words are ranked according to their frequency by putting them into a Treemap with the frequency becoming the key.

However, as there can be no duplicate keys, any words with the same frequency value in the Hashmap will not be entered into the Treemap.

What could I do in order to have a date structure that contains the words ranked by their frequency including duplicates?

   //given a list of words return a TreeMap of those words ranked by most frequent occurence
private TreeMap rankWords(LinkedList unrankedWords) {

    //treemap to automatically sort words by there frequency, making the frequency count the key.
    TreeMap<Integer, String> rankedWordsMap = new TreeMap<Integer, String>();

    //for each of the words unranked, find that word in the freqMap and add to rankedWords
    for (int i = 0; i < unrankedWords.size(); i++) {

        if (freqMap.containsKey((String) unrankedWords.get(i))) {

            rankedWordsMap.put(freqMap.get((String) unrankedWords.get(i)),
                    (String) unrankedWords.get(i));

        }

    }

    return rankedWordsMap;

}
4

6 回答 6

4

您应该重新考虑您的数据结构以获得唯一键。听起来你的结构是倒置的:它应该是一个Map要计数的单词,而不是相反,因为单词是唯一键,而计数是与键关联的值数据。

于 2013-05-06T17:38:38.757 回答
3

我将从字符串到整数频率的映射开始。

将 entrySet() 复制到 List 并按频率对其进行排序。

于 2013-05-06T17:39:04.100 回答
1

您的过程有些损坏。TreeMap 的约定要求compareTo(...)调用的行为在 TreeMap 的生命周期内永远不会改变。换句话说,您无法更新更改排序顺序的因素(例如更改频率)。

我的建议是做以下两件事之一:

  • 使用两个阶段,一个是计算词频(由单词键入),第二个阶段将单词按频率顺序排序
  • 创建为您管理动态特性的自定义数据结构(可能是两个数组)。

如果性能不重要,我可能会选择第一个。否则,第二个选项看起来是一个不错的挑战

于 2013-05-06T17:39:51.623 回答
1

列出条目并按条目值对它们进行排序。

List<Map.Entry<String, Integer>> results = new ArrayList<>();
results.addAll(freqMap.entrySet());
Collections.sort(new Comparator<Map.Entry<String, Integer>() {
    @Override
    public int compare(Map.Entry<String, Integer> lhs,
            Map.Entry<String, Integer> rhs) {
        int cmp = lhs.getValue() - rhs.getValue();
        if (cmp == 0) {
            cmp = lhs.getKey().compareTo(rhs.getKey());
        }
        return cmp;
    }
});
于 2013-05-06T17:42:13.553 回答
0

不确定这是否是最优雅的解决方案,但是一旦您的频率图完成,您可以将每个地图条目转换为代表每个地图条目的对象:

class Entry {
  String word;
  int frequency;
}

然后,您只需为该对象的频率/值编写一个比较器进行排序。

于 2013-05-06T17:42:56.803 回答
0

您可以使用Set作为 TreeMap 的值,因此您可以执行以下操作以按频率将单词添加到 Map

TreeMap<Integer, Set<String>> rankedWordsMap = new TreeMap<>();

// inside loop
String word = (String) unrankedWords.get(i);
int frequency = freqMap.get(word);
// get the set of words with the same frequency
Set<String> wordSet = rankedWordsMap.get(frequency);
// if not yet existen, create and put it into the map
if(wordSet == null) {
    wordSet = new HashSet<>();
    rankedWordsMap.put(frequency, wordSet);
}
// add the word to set of words
wordSet.add(word);

这样,您将保留所有频率相同的单词。

于 2013-05-06T17:51:28.043 回答