1

我正在编写一个程序,它从文本文件中读取单词并将所有这些单词放在一个链表中。该文件没有标点符号,只有单词。我还想将链表与预加载的黑名单进行比较,黑名单也是链表。

我已经完成的是我可以从文件中加载链接列表,打印链接列表,检查大小,计算一个单词在文件中出现的频率,而不是打印低于指定频率的单词,并且我还能够将所有单词格式化为小写以便更好地处理。

我遇到的问题是让代码正确,以便它只打印一个具有多个频率的单词的出现。因此,如果单词“the”出现 20 次,我不希望它在下一次出现时打印“the <1>”然后打印“the <2>”,清除“the <20>”我只是希望它打印一次“<20>”

我正在发布我的加载文件功能、打印功能和插入字功能,这些都是class wordCloud().

下面是代码:

void wordCloud::insertWord(string aWord){
wordNode *newWord = new wordNode(aWord);

//old code
if (head == NULL)
    head = newWord;
else{
    newWord->next = head;
    head = newWord;
}

//revised code
//newWord->next = head;
//head = newWord;
size++;
}

void wordCloud::insertWordDistinct(string word){
for (wordNode *temp = head; temp != NULL; temp = temp->next){
    if (word == temp->myWord){
        temp->freq_count++;
        //cout << temp->freq_count; //for debugging
    }
}
insertWord(word);
}

void wordCloud::printWordCloud(int freq){
wordNode *temp, *previous;
int listSize = 0;

if (head == NULL)                   //determines if there are any words in the list
    cout << "No Word Cloud" << endl;
else{
    temp = head;

    while (temp->next != NULL){         //prints each word until the list is NULL
        if (temp->freq_count >= freq){
            cout << temp->myWord << " <" << temp->freq_count << ">" << endl;
            temp = temp->next;
            listSize++;
        }
        else{
            previous = temp;
            temp = temp->next;
            previous = NULL;
            free(previous);
        }
    }
}
cout << "\nThere are " << size << " words in the file.\n";      //print file size - for debugging - works
cout << "\nThere are " << listSize << " words in the list\n\n";     //print list size - for debugging - works
system("pause");
}

void wordCloud::printBlacklist(){
wordNode *temp;

if (head == NULL)                   //determines if there is a list
    cout << "No Words in the blacklist" << endl;
else{
    temp = head;

    while (temp != NULL){           //prints each word until the list is NULL
        cout << temp->myWord << endl;
        temp = temp->next;
    }
}
cout << "\nThere are " << size << " words in the file.\n\n";        //print size - for debugging - works
system("pause");
}

void wordCloud::loadWordCloud(string fileName){
ifstream file;                      //variable for fileName
string word;                        //string to hold each word

file.open(fileName);                //open file

if (!file) {                        //error handling
    cout << "Error: Can't open the file. File may not exist.\n";
    exit(1);
}

while (!file.eof()){
    file >> word;                   //grab a word from the file one at a time

    insertWordDistinct(changeToLowerCase(word));
    //insertWord(word);             //for debugging
    //cout << word <<'\n';          //print word - for debugging
}

//printWordCloud();                 //print word cloud - for debugging - works
file.close();                       //always make sure to close file after read
}

void wordCloud::loadBlacklist(string fileName){
ifstream file;                      //variable for fileName
string bannedWord;                  //string to hold each word  

file.open(fileName);                //open file

if (!file) {                        //error handling if file does not load
    cout << "Error: Can't open the file. File may not exist.\n";
    exit(1);
}   

while (!file.eof()){
    file >> bannedWord;             //grab a word from the file one at a time

    if (bannedWord.empty()){        //error handling if file is empty
        cout << "File is empty!!\n";
        exit(1);
    }
    insertWord(changeToLowerCase(bannedWord));
    //cout << bannedWord << '\n';   //print blacklist words - for debugging
}

//printBlacklist();                 //print blacklist - for debugging - works
file.close();                       //always make sure to close file after read
}

我注意到,如果我放previous = NULL之前free(),我的程序不会崩溃,也不会出现任何 dll 内存处理错误。事实上,我可以free()完全取出,它似乎工作得很好。我只是不知道这是否是正确的方法。在我看来,如果我只是将一个节点指向 NULL<,它不一定会删除内存中的数据。free()我只是对不使用或delete()终止节点感到不安。如果我错了,请纠正我,或者请直接指出我的权利。

差不多,这有什么问题:

wordNode *previous, *temp = head;

while (temp != NULL){
    if (word == temp->myWord){
        temp->freq_count++;
        previous = temp;
        temp = temp->next;
        delete(previous);
    }
}

我可能会犯这个错误,但基本上我只需要找到插入列表中的每个单词的频率,然后删除包含该单词的多个节点,直到只留下频率计数最高的节点才能打印。我正在努力做到这insertWordDistinct(string word)一点。只是不知道该怎么做。

4

2 回答 2

2

您的打印循环对您没有任何帮助。它应该是对最小频率的简单枚举过滤。不应进行删除、释放或其他内存管理。只需遍历列表:

void wordCloud::printWordCloud(int freq)
{
    int listSize = 0;
    int uniqSize = 0;
    for (wordNode *temp = head; temp; temp = temp->next)
    {
        if (temp->freq_count >= freq)
        {
            cout << temp->myWord << " <" << temp->freq_count << ">" << endl;
            listSize += temp->freq_count;
            ++uniqSize;
        }
    }

    cout << "\nThere are " << size << " words in the file.\n";
    cout << "\nThere are " << listSize << " words in the filtered list\n\n";
    cout << "\nThere are " << uniqSize << " unique words in the filtered list\n\n";
    system("pause");
}

这也应该让您重新正确管理wordCloud::~wordCloud()析构函数中的列表,以再次正确删除节点。还有很多其他的事情我会做不同的事情,但这是一个学习过程,所以我不会破坏你的聚会。


更新

根据来自 OP 的请求,下面是一个示例链表插入函数,插入在构建列表时进行排序。在调整这一点时,他发现了与原始实现的显着差异和问题。希望它也可以帮助其他人。

void wordCloud::insert(const std::string& aWord, unsigned int freq)
{
    // manufacture lower-case version of word;
    std::string lcaseWord = make_lower(aWord);

    // search for the word by walking a pointer-to-pointer
    //  through the pointers in the linked list.
    wordNode** pp = &head;
    while (*pp && ((*pp)->myWord < lcaseWord)
        pp = &(*pp)->next;

    if (*pp && !(lcaseWord < (*pp)->myWord))
    {
        (*pp)->freq_count++;
    }
    else
    {    // insert the node
        wordNode *node = new wordNode(lcaseWord);
        node->freq_count = freq;
        node->next = *pp;
        *pp = node;
        ++size;
    }
}
于 2014-03-03T21:21:33.003 回答
0

我认为要每个单词只打印一次,您必须制作一个独特的列表,其中包含原始列表中的单词以及它们的出现次数。为此,您只需要两个循环。一个用于从原始列表中获取每个单词,第二个用于检查单词是否在唯一列表中。为此,您应该制作第二个列表并将每个单词复制一次,如果单词出现不止一次,您只需增加频率。

于 2014-03-03T09:02:00.307 回答