1

我有将大小大于 1000 万的向量写入文本文件的代码。我使用 clock() 来计时 writefile 函数及其程序中最慢的部分。有没有比我下面的方法更好的写入文件的方法?

void writefile(vector<fields>& fieldsvec, ofstream& sigfile, ofstream& noisefile)
/* Writes clean and noise data to respective files
 *
 * fieldsvec: vector of clean data
 * noisevec: vector of noise data
 * sigfile: file to store clean data
 * noisefile: file to store noise data
 */
{
    for(unsigned int i=0; i<fieldsvec.size(); i++)
    {
        if(fieldsvec[i].nflag==false)
        {
            sigfile << fieldsvec[i].timestamp << ";" << fieldsvec[i].price << ";" << fieldsvec[i].units;
            sigfile << endl;
        }
        else
        {
            noisefile << fieldsvec[i].timestamp << ";" << fieldsvec[i].price << ";" << fieldsvec[i].units;
            noisefile << endl;
        }
    }
}

我的结构在哪里:

struct fields
// Stores a parsed line of a file
{
public:
    string timestamp;
    float price;
    float units;
    bool nflag; //flag if noise (TRUE=NOISE)
};
4

3 回答 3

4

我建议摆脱endl. 这每次都有效地刷新缓冲区,从而大大增加了系统调用的数量。

'\n'而不是endl应该是一个很好的改进。

顺便说一句,代码可以简化:

ofstream& files[2] = { sigfile, noisefile };
for(unsigned int i=0; i<fieldsvec.size(); i++)
  files[fieldsvec[i].nflag] << fieldsvec[i].timestamp << ';' << fieldsvec[i].price << ";\n";
于 2013-02-02T23:18:12.017 回答
1

您可以按照此 SO 问题的第一个答案中的建议,以二进制格式而不是文本格式编写文件以提高写入速度:

file.open(filename.c_str(), ios_base::binary);
...
// The following writes a vector into a file in binary format
vector<double> v;
const char* pointer = reinterpret_cast<const char*>(&v[0]);
size_t bytes = v.size() * sizeof(v[0]);
file.write(pointer, bytes);

从同一个链接,OP 报告:

  • 用 \n 替换 std::endl 将他的代码速度提高了 1%
  • 将所有要写入的内容连接到一个流中,最后将所有内容写入文件中,代码速度提高了 7%
  • 将文本格式更改为二进制格式使他的代码速度提高了 90%。
于 2013-02-02T23:14:27.830 回答
0

一个重要的速度杀手是您将数字转换为文本。

至于原始文件输出,ofstream默认情况下,an 上的缓冲应该非常有效。

您应该将数组作为 const 引用传递。这可能没什么大不了的,但它确实允许某些编译器优化。

If you think the stream is messing things up because of repeated writes, you could try creating a string with sprintf of snprintf and write it once. Only do this if your timestamp is a known size. Of course, that would make extra copying because the string must be then put in the output buffer. Experiment.

Otherwise, it's going to start getting dirty. When you need to tweak out the performance of files, you need to start tailoring the buffers to your application. That tends to get down to using no buffering or cache, sector-aligning your own buffer, and writing large chunks.

于 2013-02-02T23:23:38.187 回答