Consider this variation of the program in the question:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char *file = "D:\\data.txt";
FILE *fp;
char *formats[] =
{
"%d%d%d%*c",
"%d%d%d",
"%*c%d%d%d",
};
if (argc > 1)
file = argv[1];
for (int i = 0; i < 3; i++)
{
if ((fp = fopen(file, "r")) == 0)
{
fprintf(stderr, "Failed to open file %s\n", file);
break;
}
printf("Format: %s\n", formats[i]);
int n1,n2,n3;
while (fscanf(fp, formats[i], &n1, &n2, &n3) == 3)
printf("%d, %d, %d\n", n1, n2, n3);
fclose(fp);
}
return 0;
}
The repeated opens are not efficient, but that isn't a concern here. Clarity and showing the behaviour is much more important.
It is written to (a) use a file name specified on the command line so I don't have to futz with names such as D:\data.txt
which are very inconvenient to create on Unix systems, and (b) shows the three formats in use.
Given the data file from the question:
243 343 434
393 322 439
984 143 943
438 243 938
The output of the program is:
Format: %d%d%d%*c
243, 343, 434
393, 322, 439
984, 143, 943
438, 243, 938
Format: %d%d%d
243, 343, 434
393, 322, 439
984, 143, 943
438, 243, 938
Format: %*c%d%d%d
43, 343, 434
393, 322, 439
984, 143, 943
438, 243, 938
Note that the first digit of the first number is consumed by the %*c
when that is the first part of the format. After the first 3 numbers are read, the %*c
reads the newline after the third number on the line, then the %d
skips further white space (except there isn't any) and reads the number.
Otherwise, the behaviour is as expounded in the commentary below, largely lifted from another related question.
Some of the code under discussion in the related question Use fscanf()
to read from given line was:
fscanf(f, "%*d %*d %*d%*c");
fscanf(f, "%d%d%d", &num1, &num2, &num3);
I noted that the code should test the return value from fscanf()
. However, with the three %*d
conversion specifications, you might get a return value of EOF if you encountered EOF before reaching the specified line. You've no way of know that the first line contained a letter instead of a digit, unfortunately, until you execute the second fscanf()
. You should test the second fscanf()
too; you might get EOF, or 0 or 1 or 2 (all of which indicate problems), or you might get 3 indicating success with 3 conversions. Note that adding \n
to the format means blank lines will be skipped, but that was going to happen anyway; %d
skips white space to the first digit.
Is there any other way we can read but ignore entire lines like I clumsily did with fscanf(f,"%*d%*d%*d")
?Is using %*[^\n]
the nearest thing one can do for this?
The best way to skip whole lines is to use fgets(), as in the last version of the code in my answer. Obviously, there's an outside chance it will miscount lines if any of those lines is longer than 4095 bytes. OTOH, that's fairly improbable.
I have a confusion now and I don't want to put it in a question. So can you tell me this—<code>fscanf() ignores whitespace automatically, so after the first line, when three integers are read and ignored according to my %*d%*d%*d
specifier, I expect fscanf()
to ignore the newline too when it starts reading in the next run of the loop. But why doesn't my additional %*c
or \n
cause problems and the program runs fine when I use %*d%*d%*d%*c
or %*d%*d%*d\n
in my code?
You can't tell where anything went wrong with those formats; you can detect EOF, but otherwise, fscanf()
will return 0. However, since the %*d
skips leading white space — including newlines — it doesn't much matter whether you read the newline after the third number with the %*c
or not, and when you have \n
there, that's a white space so the read skips the newline and any trailing or leading white space, stopping when it reaches a non-white space character. Of course, you could also have newlines in the middle of the three numbers, or you could have more than three numbers on a line.
Note that the trailing \n
in the format is particularly weird when the user is typing at the terminal. The user hits return, and keeps on hitting return, but the program doesn't continue until the user types a non-blank character. This is why fscanf()
is so difficult to use when the data is not reliable. When it's reliable, it's easy, but if anything goes wrong, diagnostics and recovery are painful. That's why it is better to use fgets()
and sscanf()
; you have control over what is being parsed, you can try again with a different format if you want to, and you can report the whole line, not just what fscanf() has not managed to interpret.
Note that %c
(and %*c
) does not skip over white space; therefore, a %*c
at the end of the format reads (and discards) the character after the number that was read. If that is the newline, then that's the character read and ignored. The scan set %[...]
is the other conversion specification that does not skip white space; all other standard conversion specifications skip leading white space.