c - fgetpos() 行为取决于换行符

Question

考虑这两个文件：

file1.txt（Windows 换行符）

abc\r\n
def\r\n

file2.txt（Unix 换行符）

abc\n
def\n

我注意到对于 file2.txt，获得的位置fgetpos没有正确增加。我在 Windows 上工作。

让我给你看一个例子。以下代码：

#include<cstdio>

void read(FILE *file)
{
    int c = fgetc(file);
    printf("%c (%d)\n", (char)c, c);

    fpos_t pos;
    fgetpos(file, &pos); // save the position
    c = fgetc(file);
    printf("%c (%d)\n", (char)c, c);

    fsetpos(file, &pos); // restore the position - should point to previous
    c = fgetc(file);     // character, which is not the case for file2.txt
    printf("%c (%d)\n", (char)c, c);
    c = fgetc(file);
    printf("%c (%d)\n", (char)c, c);
}

int main()
{
    FILE *file = fopen("file1.txt", "r");
    printf("file1:\n");
    read(file);
    fclose(file);

    file = fopen("file2.txt", "r");
    printf("\n\nfile2:\n");
    read(file);
    fclose(file);

    return 0;
}

给出这样的结果：

file1:
a (97)
b (98)
b (98)
c (99)


file2:
a (97)
b (98)
  (-1)
  (-1)

file1.txt 按预期工作，而 file2.txt 行为奇怪。为了解释它有什么问题，我尝试了以下代码：

void read(FILE *file)
{
    int c;
    fpos_t pos;
    while (1)
    {
        fgetpos(file, &pos);
        printf("pos: %d ", (int)pos);
        c = fgetc(file);
        if (c == EOF) break;
        printf("c: %c (%d)\n", (char)c, c);
    }
}

int main()
{
    FILE *file = fopen("file1.txt", "r");
    printf("file1:\n");
    read(file);
    fclose(file);

    file = fopen("file2.txt", "r");
    printf("\n\nfile2:\n");
    read(file);
    fclose(file);

    return 0;
}

我得到了这个输出：

file1:
pos: 0 c: a (97)
pos: 1 c: b (98)
pos: 2 c: c (99)
pos: 3 c:
 (10)
pos: 5 c: d (100)
pos: 6 c: e (101)
pos: 7 c: f (102)
pos: 8 c:
 (10)
pos: 10

file2:
pos: 0 c: a (97) // something is going wrong here...
pos: -1 c: b (98)
pos: 0 c: c (99)
pos: 1 c:
 (10)
pos: 3 c: d (100)
pos: 4 c: e (101)
pos: 5 c: f (102)
pos: 6 c:
 (10)
pos: 8

我知道这fpos_t并不意味着由编码器解释，因为它取决于实现。但是，上面的示例解释了fgetpos/的问题fsetpos。

换行序列怎么可能影响文件的内部位置，甚至在它遇到这些字符之前？

score 3 · Accepted Answer

我想说这个问题可能是由于第二个文件混淆了实现，因为它是以文本模式打开的，但它不符合要求。

在标准中，

文本流是组成行的有序字符序列，每行由零个或多个字符加上一个终止换行符组成

您的第二个文件流不包含有效的换行符（因为它\r\n在内部寻找转换为换行符）。结果，实现可能无法正确理解行长度，并且当您尝试在其中移动时会感到无可救药的困惑。

此外，

可能必须在输入和输出上添加、更改或删除字符，以符合在主机环境中表示文本的不同约定。

请记住，库不仅会在您调用时从文件中读取每个字节fgetc- 它会将整个文件（对于一个如此小的文件）读入流的缓冲区并对其进行操作。

score 2 · Accepted Answer

我将此添加为teppic 答案的支持信息：

在处理FILE*已作为文本而不是二进制文件打开的a 时， fgetpos()VC++ 11 (VS 2012) 中的函数可能（并且以您的file2.txt示例为例）最终会出现在这段代码中：

// ...

if (_osfile(fd) & FTEXT) {
        /* (1) If we're not at eof, simply copy _bufsiz
           onto rdcnt to get the # of untranslated
           chars read. (2) If we're at eof, we must
           look through the buffer expanding the '\n'
           chars one at a time. */

        // ...

        if (_lseeki64(fd, 0i64, SEEK_END) == filepos) {

            max = stream->_base + rdcnt;
            for (p = stream->_base; p < max; p++)
                if (*p == '\n')                     // <---
                    /* adjust for '\r' */           // <---
                    rdcnt++;                        // <---

// ...

它假定缓冲区中的任何\n字符最初\r\n都是在将数据读入缓冲区时已被规范化的序列。因此，有时它会尝试解释\r它认为先前处理文件已从缓冲区中删除的那个（现在丢失的）字符。当您接近文件末尾时，就会发生这种特殊的调整。然而，还有其他类似的调整来解释处理中删除的\r字节fgetpos()。

c - fgetpos() 行为取决于换行符

2 回答 2

Related

Reference