0

我正在 GPU 上进行数值模拟,并且大部分时间都花在定期将 CSV 数据写入 .dat 文件上。有没有比使用 fprintf() 更快的方法将数据写入 .dat 文件?我不认为 fwrite() 会起作用,因为我需要 CSV 数据。

这是我将数据写入文件以供参考时的代码。

for(int k = 0;k<gridsize;k++){
    for(int j = 0;j<gridsize;j++){
        fprintf(tempE, "%f,", h_Estate[j*gridsize + k] );
    }
}
fprintf(tempE,"\n");
4

2 回答 2

0

根据底层操作系统处理 fprintf() 的方式,使用 fwrite() 可能会获得一些效率。

正如您所指出的,您不能直接执行 fwrite(),但可以使用 sprintf() 格式化 csv 文本,然后将其推送到一个大缓冲区中。当缓冲区变满时,你 fwrite() 整个缓冲区。

通常操作系统中文件 I/O 的实现已经在执行此操作,因此 fwrite() 可能不会比 fprintf() 更有效。

正如 Eric 在回答中指出的那样,保存此数据的最有效方法是直接使用二进制格式。如果您可以对其进行预处理以减少使用量,那就更好了-

例如,您的数据是否需要完整的浮点精度?您能否将其转换为 16 位定点整数,并为每个 32 位无符号整数保存两个数据点,同时为您报告的计算保持足够的精度?如果将它们视为一组有符号整数,则 16 位有符号整数值 5 位精度。

如果您正在对这些数据进行进一步处理,您肯定不想使用 Excel 或 Matlab,因为处理时间会失控。如果您使用 C 或 C++ 开发处理算法,那么二进制数据格式将不是问题。

如果您正在绘制此数据的图形,则图形显示基本上会对数据进行下采样,因此您也可以处理到更像 10k 点并输出统计数据,这对绘图很有意义。

好吧,无论如何,有我的想法。它的目的更广泛,因为您可能已经解决了您的问题,因此其他人可能会阅读有类似问题的内容。

编辑:这是我运行的一个有趣的测试,下面是完整的可编译源

    // what's faster, fwrite or fprintf?

#include <stdio.h>
#include <stdlib.h>

#include <windows.h>

#define HUGE_NUMBER  1000

LARGE_INTEGER   ticksPerSecond;
LARGE_INTEGER   time1;
LARGE_INTEGER   time2;

float floatDiffTime;
const int runs = 1000000;

int main(int argc, char* argv[])
{
    // Get the speed of the CPU
    QueryPerformanceFrequency( &ticksPerSecond );
    printf( "Your computer does %lld ticks per second\n", ticksPerSecond.QuadPart );
    // %lld means type "long long" int, which is the
    // 64 bit int which is what we want here.

    // define some random valued variables to use
    // in the print statements
    int a    = 5;
    double b = 9.2919e92;
    char c   = 'x';
    char * d = "blah blah blah";

    // test start:  open a file to write 
    FILE *outfile = fopen( "testfile.txt", "w" );

    char buf[HUGE_NUMBER];
    int i;
    int index = 0;

    //Test line-by-line fprintf
    // START timing
    QueryPerformanceCounter( &time1 );
    memset(buf,'\0', HUGE_NUMBER);
    for(i=0; i<runs; i++)
    {
        fprintf(outfile, "blah %i %f %c %s\n", a, b, c, d );
    }
    fflush ( outfile );
    fclose( outfile );

    // STOP timing
    QueryPerformanceCounter( &time2 );

    // get the difference between time1 and time2,
    // and that is how long the for loop took to run.
    floatDiffTime = ((float)time2.QuadPart - time1.QuadPart)/ticksPerSecond.QuadPart;
    printf( "line-by-line fprintf took %f seconds\n", floatDiffTime );

    //Test fprintf
    // START timing
    QueryPerformanceCounter( &time1 );
    memset(buf,'\0', HUGE_NUMBER);
    for(i=0; i<runs; i++)
    {
        sprintf(&buf[index], "blah %i %f %c %s\n", a, b, c, d );
        index += strlen(&buf[index]);
        if(index >= HUGE_NUMBER) {
            fprintf(outfile, "%s", buf );
            index = 0;
            memset(buf,'\0', HUGE_NUMBER);
        }
    }
    fflush ( outfile );
    fclose( outfile );

    // STOP timing
    QueryPerformanceCounter( &time2 );

    // get the difference between time1 and time2,
    // and that is how long the for loop took to run.
    floatDiffTime = ((float)time2.QuadPart - time1.QuadPart)/ticksPerSecond.QuadPart;
    printf( "fprintf took %f seconds\n", floatDiffTime );

    //Test fwrite
    outfile = fopen( "testfile.txt", "w" );
    index = 0;
    /////////////////////
    // START timing
    QueryPerformanceCounter( &time1 );  
    memset(buf,'\0', HUGE_NUMBER);
    for(i=0; i<runs; i++)
    {
        sprintf(&buf[index], "blah %i %f %c %s\n", a, b, c, d );
        index += strlen(&buf[index]);
        if(index >= HUGE_NUMBER) {
            fwrite( buf, 1, strlen(buf), outfile );
            index = 0;
            //printf("buf size: %d\n", strlen(buf));
            memset(buf,'\0', HUGE_NUMBER);
        }
    }

    fflush(outfile);
    fclose( outfile );
    ////////////////////
    // STOP timing
    QueryPerformanceCounter( &time2 );

    // get the difference between time1 and time2,
    // and that is how long the for loop took to run.
    floatDiffTime = ((float)time2.QuadPart - time1.QuadPart)/ticksPerSecond.QuadPart;
    printf( "fwrite took %f seconds\n", floatDiffTime );

    //Test WriteFile
    outfile = fopen( "testfile.txt", "w" );
    index = 0;
    DWORD bWritten = 0;
    /////////////////////
    // START timing
    QueryPerformanceCounter( &time1 );  
    memset(buf,'\0', HUGE_NUMBER);
    for(i=0; i<runs; i++)
    {
        sprintf(&buf[index], "blah %i %f %c %s\n", a, b, c, d );
        index += strlen(&buf[index]);
        if(index >= HUGE_NUMBER) {
            WriteFile( outfile, buf, strlen(buf), &bWritten, NULL );
            index = 0;
            //printf("buf size: %d\n", strlen(buf));
            memset(buf,'\0', HUGE_NUMBER);
        }
    }

    fflush(outfile);
    fclose( outfile );
    ////////////////////
    // STOP timing
    QueryPerformanceCounter( &time2 );

    // get the difference between time1 and time2,
    // and that is how long the for loop took to run.
    floatDiffTime = ((float)time2.QuadPart - time1.QuadPart)/ticksPerSecond.QuadPart;
    printf( "WriteFile took %f seconds\n", floatDiffTime );


    //Test WriteFile
    outfile = fopen( "testfile.txt", "w" );
    index = 0;
    bWritten = 0;
    /////////////////////
    // START timing
    QueryPerformanceCounter( &time1 );  
    memset(buf,'\0', HUGE_NUMBER);
    for(i=0; i<runs; i++)
    {
        sprintf(&buf[index], "blah %i %f %c %s\n", a, b, c, d );
        WriteFile( outfile, buf, strlen(buf), &bWritten, NULL );
        memset(buf,'\0', strlen(buf));
    }

    fflush(outfile);
    fclose( outfile );
    ////////////////////
    // STOP timing
    QueryPerformanceCounter( &time2 );

    // get the difference between time1 and time2,
    // and that is how long the for loop took to run.
    floatDiffTime = ((float)time2.QuadPart - time1.QuadPart)/ticksPerSecond.QuadPart;
    printf( "WriteFile line-by-line took %f seconds\n", floatDiffTime );


   return 0;    
}

和结果???

Your computer does 2337929 ticks per second
line-by-line fprintf took 2.970491 seconds
fprintf took 2.345687 seconds
fwrite took 3.456101 seconds
WriteFile took 2.131118 seconds
WriteFile line-by-line took 2.495092 seconds

它看起来像将大量数据缓冲为字符串,然后传送到 fprintf()(便携式)或 Windows WriteFile()(如果使用 Windows)调用是处理此问题的最有效方法。

编译器命令:

gcc write_speed_test.c -o wspt

编译器版本:

$ gcc -v
Using built-in specs.
Target: i686-w64-mingw32
Configured with: ../gcc44-svn/configure --target=i686-w64-mingw32 --host=i686-w64-mingw32 --disable-multilib --disable-nls --disable-win32-registry --prefix=/mingw32 --with-gmp=/mingw32 --with-mpfr=/mingw32 --enable-languages=c,c++
Thread model: win32
gcc version 4.4.3 (GCC)
于 2013-12-31T22:11:43.470 回答
0

将如此大量的数据以文本形式写入磁盘并不是一个明智的选择。文本形式是人类的,但250,000个数字对人眼毫无意义。

我想您需要 CSV 格式以在 EXCEL 或 Matlab 中进行进一步的统计。您最好在您的 C 代码中进行统计,并将结果写入磁盘,该磁盘的数据量应该很小。如果你使用matlab,二进制数据也是可以接受的。在这种情况下,应该使用单个 fwrite() 而不是数千个 fprintf()。

其他解决方案包括使用单独的程序将二进制数据重新格式化为 CSV 文本格式。

于 2013-08-02T05:18:03.440 回答