c++ - 将文件读入内存，遍历数据，然后写入文件

Question

我试图向这篇文章提出类似的问题： C: read binary file to memory, alter buffer, write buffer to file 但答案对我没有帮助（我是 C++ 新手，所以我无法理解所有其中）

如何循环访问内存中的数据，并逐行遍历，以便我可以将其写入不同格式的文件？

这就是我所拥有的：

#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>

using namespace std;

int main()
{
    char* buffer;
    char linearray[250];
    int lineposition;
    double filesize;
    string linedata;
    string a;

    //obtain the file
    FILE *inputfile;
    inputfile = fopen("S050508-v3.txt", "r");

    //find the filesize
    fseek(inputfile, 0, SEEK_END);
    filesize = ftell(inputfile);
    rewind(inputfile);

    //load the file into memory
    buffer = (char*) malloc (sizeof(char)*filesize);      //allocate mem
    fread (buffer,filesize,1,inputfile);         //read the file to the memory
    fclose(inputfile);

    //Check to see if file is correct in Memory
    cout.write(buffer,filesize);

    free(buffer);
}

我很感激任何帮助！

编辑（有关数据的更多信息）：

我的数据是不同的文件，大小在 5 到 10GB 之间。大约有3亿行数据。每条线看起来像

M359

T359 3520 359

M400

A3592 zng 392

其中第一个元素是字符，其余项可以是数字或字符。我正在尝试将其读入内存，因为逐行循环比读取一行、处理然后写入要快得多。我在 64 位 linux 上编译。让我知道是否需要进一步澄清。再次谢谢你。

编辑 2 我正在使用 switch 语句来处理每一行，其中每行的第一个字符决定了如何格式化该行的其余部分。例如“M”表示毫秒，我将接下来的三个数字放入一个结构中。每行都有一个不同的第一个字符，我需要做一些不同的事情。

score 3 · Accepted Answer

所以请原谅潜在的明显明显，但如果你想逐行处理这个，那么......

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main(int argc, char *argv[])
{
    // read lines one at a time
    ifstream inf("S050508-v3.txt");
    string line;
    while (getline(inf, line))
    {
        // ... process line ...
    }
    inf.close();

    return 0;
}

只需填写while循环的主体？也许我没有看到真正的问题（有点像树木的森林）。

编辑

OP 与使用自定义 streambuf 内联，这可能不一定是世界上最便携的东西，但他更感兴趣的是避免在输入和输出文件之间来回翻转。有了足够的 RAM，这应该可以解决问题。

#include <iostream>
#include <fstream>
#include <iterator>
#include <memory>
using namespace std;

struct membuf : public std::streambuf
{
    membuf(size_t len)
        : streambuf()
        , len(len)
        , src(new char[ len ] )
    { 
        setg(src.get(), src.get(), src.get() + len);
    }

    // direct buffer access for file load.
    char * get() { return src.get(); };
    size_t size() const { return len; };

private:
    std::unique_ptr<char> src;
    size_t len;
};

int main(int argc, char *argv[])
{
    // open file in binary, retrieve length-by-end-seek
    ifstream inf(argv[1], ios::in|ios::binary);
    inf.seekg(0,inf.end);
    size_t len = inf.tellg();
    inf.seekg(0, inf.beg);

    // allocate a steam buffer with an internal block
    //  large enough to hold the entire file.
    membuf mb(len+1);

    // use our membuf buffer for our file read-op.
    inf.read(mb.get(), len);
    mb.get()[len] = 0;

    // use iss for your nefarious purposes
    std::istream iss(&mb);
    std::string s;
    while (iss >> s)
        cout << s << endl;

    return EXIT_SUCCESS;
}

score 0 · Accepted Answer

如果我必须这样做，我可能会使用这样的代码：

std::ifstream in("S050508-v3.txt");

std::istringstream buffer;

buffer << in.rdbuf();

std::string data = buffer.str();

if (check_for_good_data(data))
    std::cout << data;

这假设您确实需要一次在内存中输入文件的全部内容来确定是否应该将其复制到输出。如果（例如）您可以一次查看一个字节的数据，并确定是否应该复制该字节而不查看其他字节，您可以执行更多类似的操作：

std::ifstream in(...);

std::copy_if(std::istreambuf_iterator<char>(in),
             std::istreambuf_iterator<char>(),
             std::ostream_iterator<char>(std::cout, ""),
             is_good_char);

... whereis_good_char是一个函数，它返回一个bool说法是否char应该包含在输出中。

编辑：您正在处理的文件的大小主要排除了我上面给出的第一种可能性。你也是正确的，读取和写入大量数据几乎肯定会提高一次处理一行的速度。

score 0 · Accepted Answer

您应该研究 fgets 和 scanf，您可以在其中提取匹配的数据片段，以便更容易操作，假设这是您想要做的。像这样的东西可能看起来像：

FILE *input = fopen("file.txt", "r");
FILE *output = fopen("out.txt","w");

int bufferSize = 64;
char buffer[bufferSize];

while(fgets(buffer,bufferSize,input) != EOF){
   char data[16];
   sscanf(buffer,"regex",data);
   //manipulate data
   fprintf(output,"%s",data);
}
fclose(output);
fclose(input);

那将是更多的 C 方式来做到这一点，C++ 通过使用 istream 来更雄辩地处理事情： http ://www.cplusplus.com/reference/istream/istream/

c++ - 将文件读入内存，遍历数据，然后写入文件

3 回答 3

Related

Reference