2

我有一个数据文件:

C0001|H|Espresso Classics|The traditional espresso favourites.
C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours.
C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.
C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite.
C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins.
C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.
C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.

以及以下代码对其进行标记:

    #define MAX_CAT_TOK 4
    #define DATA_DELIM "|"
    char *token[100];

    for(i = 0; i < MAX_CAT_TOK; i++)
    {
        if(i == 0) token[i] = strtok(input, DATA_DELIM);
         else token[i] = strtok(NULL, DATA_DELIM);
         printf("%s\n", token[i]);
    }

问题是,一旦打印了跟在较长字符串后面的字符串,较长字符串中的数据就会在较短字符串的末尾打印出来。我假设这与字符串终止有关?

有人看到我在这里做错了什么吗?

4

3 回答 3

3

听起来正在发生的事情是您的缓冲区input未正确终止为空。如果它最初全为零,那么处理的第一行就可以了。如果其中存储了更长的输入,那么它仍然可以。但是,当其中存储的条目比前一个条目短时(例如,您的示例中的第 4 行),如果它不是 null 终止的,则可能会导致问题。

例如,如果新数据被复制到 via 中memcpy并且不包含空终止字符,则该行中第 4 项的标记化将包含先前的数据。

如果是这种情况,那么解决方案是确保inputnull 正确终止。

以下尝试显示我想说的内容:

strcpy( input, "a|b|c|some long data" );
tokenize( input );   // where tokenize is the logic shown in the OP calling strtok
// note the use of memcpy here rather than strcpy to show the idea
// and also note that it copies exactly 11 characters (doesn't include the null)
memcpy( input, "1|2|3|short", 11 ); 
tokenize( input );

在上述人为设计的示例中,第二次标记中的第 4 项将是:shortlong data

编辑 换句话说,问题似乎不在 OP 中显示的代码中。问题在于 `input 是如何填充的。如果您在 for 循环之前添加一个 printf 来显示正在解析的实际数据,您可能会看到它没有正确地以 null 终止。第 4 行可能会显示它包含上一行的残余:

printf( "%s\n", input );
于 2011-05-05T22:58:14.413 回答
2

我看不出有什么不对。

我冒昧地制作了您的代码的可编译版本并将其放在ideone。与您的版本比较...

#include <stdio.h>
#include <string.h>

int main(void) {
  int i, j;
  char *token[100];
  char *input;
  char inputs[7][300] = {
    "C0001|H|Espresso Classics|The traditional espresso favourites.",
    "C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours.",
    "C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.",
    "C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite.",
    "C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins.",
    "C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.",
    "C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.",
  };

  for (j = 0; j < 7; j++) {
    input = inputs[j];
    for (i = 0; i < 4; i++) {
      if (i == 0) {
        token[i] = strtok(input, "|");
      } else {
        token[i] = strtok(NULL, "|");
      }
      printf("%s\n", token[i]);
    }
  }
  return 0;
}
于 2011-05-05T22:43:08.807 回答
0

这是我的工作代码:

#include <string.h>
#include <stdio.h>

//#define DATA_DELIM "|"
#define DATA_DELIM "|\n"

int main(void)
{
    enum { LINE_LENGTH = 4096 };
    char input[LINE_LENGTH];
#define MAX_CAT_TOK 4
    char *token[100];

    while (fgets(input, sizeof(input), stdin) != 0)
    {
        printf("Input: %s", input);
        for (int i = 0; i < MAX_CAT_TOK; i++)
        {
            if (i == 0)
                token[i] = strtok(input, DATA_DELIM);
            else
                token[i] = strtok(NULL, DATA_DELIM);
            printf("%d: %s\n", i, token[i] != 0 ? token[i] : "<<NULL POINTER>>");
        }
    }
    return 0;
}

根据给定的数据,我得到:

Input: C0001|H|Espresso Classics|The traditional espresso favourites.
0: C0001
1: H
2: Espresso Classics
3: The traditional espresso favourites.
Input: C0002|H|Espresso Espresions|Delicious blend of espresso, milk, and luscious flavours.
0: C0002
1: H
2: Espresso Espresions
3: Delicious blend of espresso, milk, and luscious flavours.
Input: C0003|H|Tea & Cocoa|Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.
0: C0003
1: H
2: Tea & Cocoa
3: Gloria Jean's uses only the finest cocoas and single garden teas. Always satisfying.
Input: C0004|C|Iced Chocolate|Gloria Jean's version of a traditional favourite.
0: C0004
1: C
2: Iced Chocolate
3: Gloria Jean's version of a traditional favourite.
Input: C0005|C|Mocha Chillers|An icy blend of chocolate, coffee, milk and delicious mix-ins.
0: C0005
1: C
2: Mocha Chillers
3: An icy blend of chocolate, coffee, milk and delicious mix-ins.
Input: C0006|C|Espresso Chillers|A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.
0: C0006
1: C
2: Espresso Chillers
3: A creamy blend of fresh espresso, chocolate, milk, ice, and flavours.
Input: C0007|C|On Ice|Cool refreshing Gloria Jean's creations over ice.
0: C0007
1: C
2: On Ice
3: Cool refreshing Gloria Jean's creations over ice.

使用单字符分隔符字符串,我在编号为 3 的每一行之后得到一个额外的换行符。

这看起来很像你想要的。所以,要么您的输入有问题(您在阅读时是否回显了它),或者您已经设法找到了 的易碎实现strtok(),或者您在 Windows 上并且数据行有回车符和换行符,由于杂散的回车,您会看到误导性的输出。

其中,我怀疑最后一个(Windows 和杂散回车)是最有可能的 - 尽管即使使用 DOS 格式的数据文件我也无法重现该问题(在 MacOS X 10.6.7 上使用 GCC 4.6.0 进行测试) .

于 2011-05-05T23:51:02.607 回答