DISCLAIMER: I don't know, if this solves the problem, but I know, that I can get time differences by a factor of up to 20. I also did only test the output of data and didn't read it.
I was investigating the interaction of Fortran with python and as such wanted to know, how Fortran's binary files are build. While doing this, I noticed, that both ifort and gfortran have an option to switch buffered IO on or off.
ifort: You can specify the keyword BUFFERED=['YES'|'NO'] while opening a file.
gfortran: You can set the environmental variable GFORTRAN_UNBUFFERED_ALL to y|Y|1 or n|N|0 for unbuffered and buffered IO, respectively.
Please note, that gfortran does buffer IO by default, while ifort does not.
My sample code at the bottom results in the following times:
        |buffered|unbuffered
--------+--------+----------
ifort   |   1.9s |  18.2s
gfortran|   2.4s |  37.5s
This sample code writes a direct access binary file with 10M datasets of 12 bytes each.
PROGRAM btest
IMPLICIT NONE
INTEGER :: i
! IFORT
OPEN(11,FILE="test_d.bin",ACCESS="DIRECT",FORM="UNFORMATTED",RECL=3, &
& STATUS="REPLACE",BUFFERED="NO") ! ifort defines RECL as words
! GFORTRAN
!OPEN(11,FILE="test_d.bin",ACCESS="DIRECT",FORM="UNFORMATTED",RECL=12, &
!& STATUS="REPLACE") ! gfortran defines RECL as bytes
DO i = 1, 10000000
    WRITE(11,REC=i) i,i*1._8
END DO
CLOSE(11)
END PROGRAM