2

I have a file with DOS line endings that I receive at run-time, so I cannot convert the line endings to UNIX-style offline. Also, my app runs on both Windows and Linux. My app does an fgets() on the file and tries to read in line-by-line.

Would the number of bytes read per line on Linux also account for 2 trailing characters (\r \n) or would it contain only (\n) and the \r would be discarded by the underlying system?

EDIT:

Ok, so the line endings are preserved while reading a file on Linux, but I have run into another issue. On Windows, opening the file in "r" or "rb" is behaving differently. Does windows treat these two modes distinctly, unlike Linux?

4

4 回答 4

4

fgets() keeps line endings.

http://msdn.microsoft.com/en-us/library/c37dh6kf(v=vs.80).aspx

fgets() itself doesn't have any special options for converting line endings, but on Windows, you can choose to either open a file in "binary" mode, or in "text" mode. In text mode Windows converts the CR/LF sequence (C string: "\r\n") into just a newline (C string: "\n"). It's a feature so that you can write the same code for Windows and Linux and it will work (you don't need "\r\n" on Windows and just "\n" on Linux).

http://msdn.microsoft.com/en-US/library/yeby3zcb(v=vs.80)

Note that the Windows call to fopen() takes the same arguments as the call to fopen() in Linux. The "binary" mode needs a non-standard character ('b') in the file mode, but the "text" mode is the default. So I suggest you just use the same code lines for Windows and Linux; the Windows version of fopen() is designed for that.

The Linux version of the C library doesn't have any tricky features. If the text file has CR/LF line endings, then that is what you get when you read it. Linux fopen() will accept a 'b' in the options, but ignores it!

http://linux.die.net/man/3/fopen

http://linux.die.net/man/3/fgets

于 2012-07-03T08:43:54.047 回答
3

On Unix, the lines would be read to the newline \n and would include the carriage return \r. You would need to trim both off the end.

于 2012-07-03T06:30:16.553 回答
1

Although the other answers gave satisfying information regarind the question what kind of line ending would be returned for a DOS file read under UNIX, I'd like to mentioned an alternative way to chop off such line endings.

The significant difference is, that the following approach is multi-byte-character save, as it does not involve any characters directly:

if (pszLine && (2 <= strlen(pszLine)))
{ 
  size_t size = strcspn(pszLine, "\r\n"); 
  pszLine[size] = 0; 
} 
于 2012-07-03T06:59:50.053 回答
0

You'll get what's actually in the file, including the \r characters. In unix there aren't text files and binary files, there are just files, and stdio doesn't do conversions. After reading a line into a buffer with fgets, you can do:

char *p = strrchr(buffer, '\r');
if(p && p[1]=='\n' && p[2]=='\0') {
    p[0] = '\n';
    p[1] = '\0';
}

That will change a terminating \r\n\0 into \n\0. Or you could just do p[0]='\0' if you don't want to keep the \n.

Note the use of strrchr, not strchr. There's nothing that prevents multiple \rs from being present in the middle of a line, and you probably don't want to truncate the line at the first one.

Answer to the EDIT section of the question: yes, the "b" in "rb" is a no-op in unix.

于 2012-07-03T06:34:45.470 回答