2

我需要编写一个函数,该函数将从字符串中删除所有重复的子字符串,下面的函数比工作但不那么正确。

输入:这是对第 2 课的简单测试 退出第 2 课

输出:第2课的这个简单测试退出

如您所见,从句子中删除函数“ is ”,但它不正确。

void RemoveDuplicates(char text[], size_t text_size, char** output)
{
    char *element;
    /* Allocate size for output. */
    *output = (char*) malloc(text_size);
    *output[0] = '\0';

    /* Split string into tokens */
    element = strtok(text, " ");
    if (element != NULL)
        strcpy(*output, element);

    while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        if (strstr(*output, element) == NULL) {
            strcat(*output, " " );
            strcat(*output, element );
        }
    }
}

更新版本的代码 (@Rohan)

输入:这是一个简单的测试,对于第 2 课来说很简单退出

输出:这是一个简单的测试,对于第 2 课退出

void RemoveDuplicates(char text[], size_t text_size, char** output)
{
    char *temp = NULL;
    char *element;
    /* Allocate size for output. */
    *output = (char*) malloc(text_size);
    *output[0] = '\0';

    /* Split string into tokens */
    element = strtok(text, " ");
    if (element != NULL)
        strcpy(*output, element);

    while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        temp = strstr(*output, element);
        /* check for space before/after it or '\0' after it. */
        if (temp == NULL || temp[-1] == ' ' || temp[strlen(element)] == ' ' || temp[strlen(element)] == '\0'  ) {

            strcat(*output, " " );
            strcat(*output, element );
        }
    }
}
4

3 回答 3

4

您需要检查一个单词element而不是纯字符串。

您得到的是,在您的输入字符串中有 2"is"个是 part of"This"而另一个是实际 word "is"

 This is a simple test for lesson2 Quit lesson2
 --^ -^  

strstr()找到两个字符串,并删除 2nd "is"。但是您只需要找到重复的单词。

您可以通过检查' '找到的单词前后的空格来做到这一点。如果它的最后一个字检查'\0'在最后。

尝试将您的 while 循环更新为:

char temp[512] = { 0 }; //use sufficient array
while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        //create word
        sprintf(temp, " %s ", element);
        if(strstr(*output, temp) == NULL) {
            strcat(*output, " " );
            strcat(*output, element );
        }
    }
于 2013-08-14T05:18:18.687 回答
0

免责声明: 这不会修复您的算法,请参阅@Rohans 答案以了解您的算法中的修复。


要修复您的代码,请执行以下操作:

*output = (char*) malloc(text_size);

... 应该:

char *output = malloc(text_size);

...并更改:

*output[0] = '\0';

... 成为:

output[ 0 ] = '\0';

...不要强制分配内存块。您可以在此处阅读有关此内容的更多信息。观察output[ 0 ]暗示*( output + 0 )

接下来,更改:

strcpy(*output, element);

... 到:

strcpy(output, element);

...然后更改:

if (strstr(*output, element) == NULL) {
  strcat(*output, " " );
  strcat(*output, element );
}

... 到:

if (strstr(output, element) == NULL) {
  strcat(output, " " );
  strcat(output, element );
}

...请注意,这output已经是一个指针,使用*您所做的取消引用返回字符的指针。strstrstrcpy要求它dest是一个指向字符数组的指针,一个字符。

于 2013-08-14T05:16:58.913 回答
0

你可以试试这样的东西

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char test_text[] = "This is a is is simple test simple for lesson2 Quit";

int main(int argc, char* argv[])
{
  const int maxTokens = 200; // lets assume there is max 200 tokens in a sentence
  char* array[maxTokens];    // pointers to strings
  int unique_tokens = 0;     // number of unique tokens found in string

  // first tokenize the string and put it into a structure that is a bit more flexible
  char* element = strtok(test_text," ");
  for (; element != NULL; element = strtok(NULL, " "))
  {
     int foundToken = 0;
     int i;
     // do we have it from before?
     for (i = 0; i < unique_tokens && !foundToken; ++i)
     {
       if ( !strcmp(element, array[i]) )
       {
         foundToken = 1;
       }
     }

     // new token, add
     if ( !foundToken )
     {
       array[unique_tokens++] = (char*)strdup(element); // this allocates space for the element and copies it
     }
  }

  // now recreate the result without the duplicates.

  char result[256] = {0};
  int i;
  for (i = 0; i < unique_tokens; ++i)
  {
    strcat(result,array[i]);
    if ( i < unique_tokens - 1 )
    {
      strcat(result," ");
    }
  }

  puts( result );

  return 0;
}
于 2013-08-14T08:13:18.240 回答