c++ - 用向量计算元素的出现次数

Question

我目前正在通过自己阅读 Andrew Koenig 和 Barbara Moo 所著的 Accelerated C++ 一书（正确地）学习 C++，并完成每章中的所有练习。

练习 3-3：编写一个程序来计算每个不同单词在其输入中出现的次数。对我来说，这个练习似乎非常困难，特别是考虑到：1.那一章的例子和其他练习相对简单，2.你只允许使用向量，所以没有什么高级的。（或者也许只是我误判了难度）

我在网上搜索提示并看到其他人在此练习中遇到问题，但人们提供的解决方案对我来说似乎不清楚。大多数人建议使用本书后面介绍的组织方法，这与练习的要点相悖。最后，我将我在不同论坛（包括这里）上找到的提示和方法拼凑起来，提出了我自己的解决方案：

#include <algorithm>
#include <iomanip>
#include <ios>
#include <iostream>
#include <string>
#include <vector>

using std::cin;
using std::setprecision;
using std::cout;
using std::string;
using std::endl;
using std::streamsize;
using std::sort;
using std::vector;

int main()
{

// Ask for string input

cout << "Please write some text, followed by end-of-file: " << endl;

vector<string> word_input;
string word;

// input words into string vector word_input

    typedef vector<string>::size_type vecsize;


    while (cin >> word) 
    {
        word_input.push_back(word);                 
    }

// sort the vector in alphabetical order to be able to separate distinct words

    sort(word_input.begin(),word_input.end());

// create two vectors: one where each (string) element is a unique word, and one
// that stores the index at which a new distinc word appears

    vector<string> unique_words;
    vector<int> break_index;


    for (int i=0; i != word_input.size()-1; ++i)
    {
        if(word_input[i+1] != word_input[i])
            {
                unique_words.push_back(word_input[i]);
                break_index.push_back(i);
            }

    }

// add the last word in the series to the unique word string vector

    unique_words.push_back(word_input[word_input.size()-1]);

// create a vector that counts how many times each unique word occurs, preallocate
// with 1's with as many times a new word occurs in the series (plus 1 to count the first word)

    vector<int> word_count(1,break_index[0]+1);

// if a new word occurs, count how many times the previous word occured by subtracting the number of words so far

    for(int i=0; i != break_index.size()-1;++i)
        {
            word_count.push_back(break_index[i+1] - break_index[i]);
        }

// add the number of times the last word in the series occurs: total size of text - 1 (index starts at 0) - index at which the last word starts

    word_count.push_back(word_input.size()-1-break_index[break_index.size()-1]);


    // number of (distinct) words and their frequency output

    cout << "The number of words in this text is: " << word_input.size() << endl;

    cout << "Number of distinct words is: " << unique_words.size() << endl;

        // The frequency of each word in the text

        for(int i=0; i != unique_words.size(); ++i)
            cout << unique_words[i] << " occurs " << word_count[i] << " time(s)" << endl;



return 0;
}

有没有更好的方法使用向量来做到这一点？可以通过组合任何循环使代码更高效吗？

score 1 · Accepted Answer

对我有用的解决方案（当我解决这个问题时）是使用三个向量： an input_vector、 anoutput_vector和 a count_vector。while使用 using读取用户输入，std::cin直到输入转义字符：使用input_vector.push_back(input_word)以填充input_vector单词。使用std::sortfrom<algorithm>对向量进行排序，并创建output_vector（具有一个值，中的第一个单词input_vector）和count_vector（具有一个值，1）。

然后，对于中的每个元素input_vector（从第二个开始，而不是从第一个开始），检查当前元素是否与最后一个元素相同。如果是，则添加1到中的当前元素count_vector。否则，将当前单词添加input_vector到output_vectorusing 中push_back()，并增加count_vector一个元素的大小（其值为1）。

score 0 · Accepted Answer

如果您想象有人正在使用您的代码来处理莎士比亚的全部作品，那么存储每个单词都会浪费大量空间。如果您改为持有“单词”和“单词计数”的结构，您只需存储单词“the”一次，即使它在您的程序正在输入的文本中出现 100000 次。也就是说，如果您甚至需要知道该单词已经出现了不止一次 - 如果您只需要一个唯一单词的列表，那么您所需要的就是查看您是否已经存储了该单词。[按排序顺序存储它们可以binary_search用来查找它们，如果您确实通过您的代码运行莎士比亚的 800K（非唯一）单词，这将有助于运行时]

c++ - 用向量计算元素的出现次数

2 回答 2

Related

Reference