c++ - 从文本文件创建字典

Question

此代码应该输出文件中的每个单词以及它出现的次数，（编辑：忽略大写/小写差异）。目前，它没有正确执行此操作。这是由于某种空格/标点符号吗？

struct entry
    {
        string word;
        int count;

    };  


      int main()
        {
            ifstream input1;
            input1.open("Base_text.txt");

            if (input1.fail())
            {
                cout<<"Input file 1 opening failed."<<endl;
                exit(1);
            }

            ifstream input2;
            input2.open("Test_file.txt");

            if (input2.fail())
            {
                cout<<"Input file 2 opening failed."<<endl;
                exit(1);
            }

            vector<entry> base;

            make_dictionary(input1, base);

            int i;
            for (i=0; i<base.size(); i++)
            {
                cout<<base[i].word<<": "<<base[i].count<<endl;
            }


        }

        void make_dictionary(istream& file, vector<entry>& dict)
        {


            string word;

            while (file>>word)
            {
                int i;
                bool found = false;

                for (i=0; i<dict.size(); i++)
                {
                   if (dict[i].word==word)
                   {
                       dict[i].count++;
                       found=true;

                   }
                }


                if(!found)
                {
                    entry ent;
                    ent.word = word;
                    ent.count = 1;
                    dict.push_back(ent);
                }
            }


        }

输入

This is some simple base text to use for comparison with other files.
You may use your own if you so choose; your program shouldn't actually care.
For getting interesting results, longer passages of text may be useful.
In theory, a full novel might work, although it will likely be somewhat slow.

当前（不正确）输出：

This: 1
is: 1
some: 1
simple: 1
base: 1
text: 2
to: 1
use: 2
for: 1
comparison: 1
with: 1
other: 1
files.: 1
You: 1
may: 2
your: 2
own: 1
if: 1
you: 1
so: 1
choose;: 1
program: 1
shouldn't: 1
actually: 1
care.: 1
For: 1
getting: 1
interesting: 1
results,: 1
longer: 1
passages: 1
of: 1
be: 2
useful.: 1
In: 1
theory,: 1
a: 1
full: 1
novel: 1
might: 1
work,: 1
although: 1
it: 1
will: 1
likely: 1
somewhat: 1
slow.: 1

我们不允许在这个项目上使用地图。关于我哪里出错的任何想法？

score 0 · Accepted Answer

如果不考虑大小写，只需在阅读后将单词转换为小写即可。然后去掉尾随标点。例如

while (file>>word)
{
    std::transform(word.begin(), word.end(), word.begin(), ::tolower);
    word.erase(word.find_last_of(','), 1);
    word.erase(word.find_last_of(';'), 1);
    word.erase(word.find_last_of('.'), 1);
    ...

c++ - 从文本文件创建字典

1 回答 1

Related

Reference