2

我正在尝试学习 C,目前正在编写一个玩具脚本。现在,它只是打开一个文本文件,逐个字符地读取它,然后将其输出到命令行。

我查找了如何查看文件的大小(使用 fseek() 然后 ftell()),但它返回的结果与我在迭代时在 while 循环中计算字符得到的数字不匹配文件。

我想知道差异是否是由于 Windows 使用 \r\n 而不仅仅是 \n,因为差异似乎是#newlines+1。

以下是我正在处理的脚本:

#include <stdio.h>
#include <stdlib.h>

int main()
{
        FILE * fp = fopen("test.txt", "r");

        fseek(fp, 0, SEEK_END);
        char * stringOfFile = malloc(ftell(fp));
        printf("allocated %d characters for file\n", ftell(fp));
        fseek(fp,0,SEEK_SET);//reset pointer

        char tmp = getc(fp); //current letter in file
        int i=0;
        while (tmp != EOF) //End-Of-File (defined in stdio.h)
        {
                *(stringOfFile+i) = tmp;
                tmp = getc(fp);
                i++;
        }
        fclose(fp);
        printf("Turns out we had %d characters to store.\nThe file was as follows:\n", i);
        printf("%s", stringOfFile);
}

我得到的输出(你可以从输出中看到一个简单的测试文件)是:

allocated 67 characters for file
Turns out we had 60 characters to store.
The file was as follows:
line1
line2
line3
line4
line5
(last)line6

lmnopqrstuvw▬$YL Æ

其中打印的尾部位似乎是由于为字符串分配过多内存而产生的垃圾。

提前感谢您提供的任何帮助/答案!

4

2 回答 2

3

If you're running windows:

FILE * fp = fopen("test.txt", "r");

opens the file in text mode which implies \r\n conversion to \n

So if your file has 7 lines, the conversion removes 7 chars (that is, if the file was using Windows-style line termination)

The fix is to open it in binary mode

FILE * fp = fopen("test.txt", "rb");

so ftell and reading chars one by one should match.

Of course, that's wasting space & not very convenient to have \r chars in your text, so you could allocate like you're doing, and in the end perform a realloc to shrink down the allocated memory with the actual number of chars (since it's smaller, it's ok)

stringOfFile = realloc(stringOfFile,i+1);

Note that since I've taken the need to add the nul-terminator into account, I've added 1 to the number of chars, so if there aren't any \r chars in the file, the realloc could increase the size of the block by 1.

So, as I was hinting at, don't forget to nul-terminate your string or printf doesn't stop properly:

stringOfFile[i] = '\0';

(unless you don't care about creating a C-string, since storing the string size + display char-by-char is also correct)

We've see that the ftell method is tricky, and in some cases, when the stream is for instance the output of a command (popen returns a FILE * but you cannot fseek it) or a socket, whatever, this principle cannot be applied since we don't know the size of the data in advance.

In the general case, it would be better to:

  • allocate a small buffer
  • read char by char and store
  • if buffer is full, call realloc to increase the size by some step (not at every char, performance would be bad)
  • in the end, call realloc again to adjust the size more precisely

(that solves the binary/text issue transparently as well)

Note that if you're working with large files (>4GB) you have to use 64-bit unsigned integers for positions and fopen64 flavours of I/O functions (and all offset variables like i should be unsigned / conform to return type of ftell or you'll start having problems at 2GB). Well, I suppose it doesn't matter much when processing moderately small text files.

Also, check David answer. With text files, putting the result of getc in a char should work, but not in the general case with binary files.

于 2018-01-08T20:57:35.247 回答
1
    char tmp = getc(fp); //current letter in file
    int i=0;
    while (tmp != EOF) //End-Of-File (defined in stdio.h)

您需要检查 for 返回的getcEOF。相反,您将其转换为 a char,然后检查它是否等于EOF转换为 a char。但是,如果char转换为的值EOF实际上在文件中呢?检查文档,getc返回一个int.

你还有其他错误。

于 2018-01-08T21:20:11.000 回答