c++ - 从文件读取时提高空间复杂度

Question

我在文件中有一行任意长的整数（或浮点值），用逗号分隔：

1,2,3,4,5,6,7,8,2,3,4,5,6,7,8,9,3,...  (can go upto >100 MB)

现在，我必须读取这些值并将它们存储在一个数组中。

我当前的实现如下所示：

 float* read_line(int dimension)
   {
     float *values = new float[dimension*dimension]; // a line will have dimension^2 values
     std::string line;
     char *token = NULL, *buffer = NULL, *tmp = NULL;
     int count = 0;

     getline(file, line);
     buffer = new char[line.length() + 1];
     strcpy(buffer, line.c_str());
     for( token = strtok(buffer, ","); token != NULL; token = strtok(NULL, ","), count++ )
       {
         values[count] = strtod(token, &tmp);
       }
     delete buffer;
     return values;
   }

我不喜欢这个实现，因为：

使用ifstream整个文件被加载到内存中，然后被克隆到一个float []
有不必要的重复（从std::stringto转换const char*）

有哪些方法可以优化内存利用率？

谢谢！

score 4 · Accepted Answer

像这样的东西？

float val;
while (file >> val)
{
  values[count++] = val;
  char comma;
  file >> comma; // skip comma
}

score 1 · Accepted Answer

使用升压分词器和istreambuf_iterator：

std::vector<float> test; //Optionally call reserve to avoid frequent memory reallocation
boost::tokenizer<boost::char_separator<char>, std::istreambuf_iterator<char> > tokens(std::istreambuf_iterator<char> (in), std::istreambuf_iterator<char>(), boost::char_separator<char>(","));
//Replace this lambda by your favourite conversion function.
std::transform(tokens.begin(), tokens.end(), std::back_inserter(test), [](std::basic_string<char> s) { return atof(s.c_str()); } );

编辑：test是我使用的values，除了它是 astd::vector而不是数组，这通常是更好的选择。

恕我直言，这段代码有一些优点。迭代器具有内置的 eof 处理，您可以非常轻松地扩展分隔符。它非常容易出错（尤其是当您使用使用异常的 atof 替换时）。

score 0 · Accepted Answer

我想根据osgx关于使用scanf的建议尝试一些东西：

freopen("testcases.in", "r", stdin);
while( count < total_values)
       {
         scanf("%f,",&values[count]);
         count++;
       }

c++ - 从文件读取时提高空间复杂度

3 回答 3

Related

Reference