c - 数组 - 值跳转

Question

我正在尝试计算字典中单词开头字母的频率，其中包含 140 000 个单词。我将频率保存在数组count中，count[0] 用于字母 a，count [1] 用于字母 b ...但是，当我对数组count求和时，该值不等于单词总数在字典里。我发现如果我将字典大小减少到 95137，数字相等，但是一旦字典超过 95137 个单词，count[0] 到 count[4] 的值突然变得非常大。我不知道为什么..这是我的代码：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void)
{
    FILE *fp = fopen("testdic.txt", "r");
    int count[26];
    char buffer[30];
    for (int i = 0; i < 26; i++)
        count[i] = 0;
    int total = 0;
    while (1)
    {
        fscanf(fp, "%s", buffer);
        if (feof(fp))
            break;
        count[buffer[0]-97] ++;
        total++;
        if (count[0] > total)            // I used this to find out where the jump occurs
            break;
    }
    printf("%d ", i);
    for (int i = 0; i < 26; i++)
        printf("%d " , count[i]);

}

score 3 · Accepted Answer

It is difficult to know why this code produces strange outputs, since you are forgetting several debug checkings.

feof should be used only if your read function fails;
you don't check fopen return value;
you don't check scanf return value;
you don't check buffer[0] value;
you don't check buffer length in %s.

score 1 · Accepted Answer

在此语句中count[buffer[0]-97] ++;，您正在考虑通过获取起始字母的 ascii 值并减去 97 即 ascii 值来导出索引a。我不确定您是否打算处理以大写字母开头的单词，例如Asciiwhere buffer[0]is 65 并且表达式的buffer[0] - 97计算结果为负整数。这可能会导致堆栈损坏。

score 0 · Accepted Answer

不知道这是否是问题所在，但您的代码应该关心单词中是否包含奇数字符。所以，简单地做

count[buffer[0]-97]

如果你问我有点鲁莽！buffer[0] >=97 & buffer[0]<97+26在执行该行之前，我会确保这一点。否则，谁知道你在增加什么！

——会不会是你的第 95138 个单词以一个有趣的字符开头？

score 0 · Accepted Answer

你的total变量是在你的count数组之后声明的——所以当你超出数组的范围时，你会把一切都搞砸了。小于 'a' 的字符会特别糟糕 - 但坦率地说，第一个大写字母就足够了。使用count[(buffer[0]-'a')%26]++- 模除法迫使您留在阵列中。这是一个kluge，但如果这有效，你可以开始寻找坏角色。

c - 数组 - 值跳转

4 回答 4

Related

Reference