1

我下面的代码给了我字符串中出现次数最多的单词。我想从向量中获取三个最常出现的单词及其计数值。有什么帮助吗?

我用过vectorunordered_map。在代码的最后一部分中,我从vector.

int main(int argc,char *argv[])
    {
        typedef std::unordered_map<std::string,int> occurrences;
        occurrences s1;
        std::string input = argv[1];

        std::istringstream iss(std::move(input));
        std::vector<std::string> most;
        int max_count = 0,second=0,third=0;


//Here I get max_count, 2nd highest and 3rd highest count value 
       while (iss >> input)
        {
            int tmp = ++s1[input];
            if (tmp == max_count)
            {
                most.push_back(input);
            }
            else if (tmp > max_count)
            {
                max_count = tmp;
                most.clear();
                most.push_back(input);
                third = second;
                second = max_count;
            }
            else if (tmp > second)
            {
                third = second;
                second = tmp;
            }
            else if (tmp > third)
            {
                third = tmp;
            }
        }

//I have not used max_count, second, third below. I dont know how to access them for my purpose

      //Print each word with it's occurenece. This works fine 
      for (occurrences::const_iterator it = s1.cbegin();it != s1.cend(); ++it)
            std::cout << it->first << " : " << it->second << std::endl;;

      //Prints word which occurs max time. **Here I want to print 1st highest,2nd highest,3rd highest occuring word with there occurrence.  How to do?**
      std::cout << std::endl << "Maximum Occurrences" << std::endl;
        for (std::vector<std::string>::const_iterator it = most.cbegin(); it != most.cend(); ++it)
            std::cout << *it << std::endl;

       return 0;
    } 

有什么想法可以得到 3 个最常出现的单词吗?

4

3 回答 3

3

我更喜欢使用 astd::map<std::string, int>代替

将其用作源映射,从std::vector<std::string>

现在创建 multimap,源地图的翻转版本,使用std::greater<int>as Comparator

这张最终地图的前三个值是最常用的词

例子 :

#include<iostream>
#include<algorithm>
#include<map>
#include<vector>

int main()
{
 std::vector<std::string> most { "lion","tiger","kangaroo",
                                 "donkey","lion","tiger",
                                 "lion","donkey","tiger"
                                 };
std::map<std::string, int> src;
for(auto x:most)
    ++src[x];

std::multimap<int,std::string,std::greater<int> > dst;

std::transform(src.begin(), src.end(), std::inserter(dst, dst.begin()), 
                   [] (const std::pair<std::string,int> &p) {
                   return std::pair<int,std::string>(p.second, p.first);
                   }
                 );

std::multimap<int,std::string>::iterator it = dst.begin();

 for(int count = 0;count<3 && it !=dst.end();++it,++count)
   std::cout<<it->second<<":"<<it->first<<std::endl;

}

在这里演示

于 2013-08-31T11:25:17.183 回答
1

使用堆来存储三个最常出现的单词更容易和更干净。它也很容易扩展到大量出现频率最高的单词。

于 2013-08-31T11:09:48.440 回答
1

如果我想知道出现次数最多的 n 个单词,我将有一个 n 元素数组,遍历单词列表,并将使其进入我的前 n 个单词的单词存储到数组中(删除最低的单词)。

于 2013-08-31T11:12:02.280 回答