c - 从C中的句子中删除重复的单词

Question

我需要编写一个函数，该函数将从字符串中删除所有重复的子字符串，下面的函数比工作但不那么正确。

输入：这是对第 2 课的简单测试退出第 2 课

输出：第2课的这个简单测试退出

如您所见，从句子中删除函数“ is ”，但它不正确。

void RemoveDuplicates(char text[], size_t text_size, char** output)
{
    char *element;
    /* Allocate size for output. */
    *output = (char*) malloc(text_size);
    *output[0] = '\0';

    /* Split string into tokens */
    element = strtok(text, " ");
    if (element != NULL)
        strcpy(*output, element);

    while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        if (strstr(*output, element) == NULL) {
            strcat(*output, " " );
            strcat(*output, element );
        }
    }
}

更新版本的代码 (@Rohan)

输入：这是一个简单的测试，对于第 2 课来说很简单退出

输出：这是一个简单的测试，对于第 2 课退出

void RemoveDuplicates(char text[], size_t text_size, char** output)
{
    char *temp = NULL;
    char *element;
    /* Allocate size for output. */
    *output = (char*) malloc(text_size);
    *output[0] = '\0';

    /* Split string into tokens */
    element = strtok(text, " ");
    if (element != NULL)
        strcpy(*output, element);

    while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        temp = strstr(*output, element);
        /* check for space before/after it or '\0' after it. */
        if (temp == NULL || temp[-1] == ' ' || temp[strlen(element)] == ' ' || temp[strlen(element)] == '\0'  ) {

            strcat(*output, " " );
            strcat(*output, element );
        }
    }
}

score 4 · Accepted Answer

您需要检查一个单词element而不是纯字符串。

您得到的是，在您的输入字符串中有 2"is"个是 part of"This"而另一个是实际 word "is"。

 This is a simple test for lesson2 Quit lesson2
 --^ -^

strstr()找到两个字符串，并删除 2nd "is"。但是您只需要找到重复的单词。

您可以通过检查' '找到的单词前后的空格来做到这一点。如果它的最后一个字检查'\0'在最后。

尝试将您的 while 循环更新为：

char temp[512] = { 0 }; //use sufficient array
while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        //create word
        sprintf(temp, " %s ", element);
        if(strstr(*output, temp) == NULL) {
            strcat(*output, " " );
            strcat(*output, element );
        }
    }

score 0 · Accepted Answer

免责声明： 这不会修复您的算法，请参阅@Rohans 答案以了解您的算法中的修复。

要修复您的代码，请执行以下操作：

*output = (char*) malloc(text_size);

... 应该：

char *output = malloc(text_size);

...并更改：

*output[0] = '\0';

... 成为：

output[ 0 ] = '\0';

...不要强制分配内存块。您可以在此处阅读有关此内容的更多信息。观察output[ 0 ]暗示*( output + 0 )。

接下来，更改：

strcpy(*output, element);

... 到：

strcpy(output, element);

...然后更改：

if (strstr(*output, element) == NULL) {
  strcat(*output, " " );
  strcat(*output, element );
}

... 到：

if (strstr(output, element) == NULL) {
  strcat(output, " " );
  strcat(output, element );
}

...请注意，这output已经是一个指针，使用*您所做的取消引用返回字符的指针。strstr并strcpy要求它dest是一个指向字符数组的指针，一个字符。

score 0 · Accepted Answer

你可以试试这样的东西

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char test_text[] = "This is a is is simple test simple for lesson2 Quit";

int main(int argc, char* argv[])
{
  const int maxTokens = 200; // lets assume there is max 200 tokens in a sentence
  char* array[maxTokens];    // pointers to strings
  int unique_tokens = 0;     // number of unique tokens found in string

  // first tokenize the string and put it into a structure that is a bit more flexible
  char* element = strtok(test_text," ");
  for (; element != NULL; element = strtok(NULL, " "))
  {
     int foundToken = 0;
     int i;
     // do we have it from before?
     for (i = 0; i < unique_tokens && !foundToken; ++i)
     {
       if ( !strcmp(element, array[i]) )
       {
         foundToken = 1;
       }
     }

     // new token, add
     if ( !foundToken )
     {
       array[unique_tokens++] = (char*)strdup(element); // this allocates space for the element and copies it
     }
  }

  // now recreate the result without the duplicates.

  char result[256] = {0};
  int i;
  for (i = 0; i < unique_tokens; ++i)
  {
    strcat(result,array[i]);
    if ( i < unique_tokens - 1 )
    {
      strcat(result," ");
    }
  }

  puts( result );

  return 0;
}

c - 从C中的句子中删除重复的单词

更新版本的代码 (@Rohan)

3 回答 3

Related

Reference